Re: En/decrypting in multi-sector batches?

Will Drewry <wad@xxxxxxxxxxxx> · Wed, 27 Apr 2011 10:07:14 -0500

On Wed, Apr 27, 2011 at 2:19 AM, Milan Broz <mbroz@xxxxxxxxxx> wrote:
> Hi,
>
> On 04/26/2011 05:17 PM, Will Drewry wrote:
>> Recently, I've been benchmarking some different hardware crypto
>> accelerators and many of them appear to be tuned toward largish
>> requests (up to 16k) with a given key and a base IV.
>
> Please can you explicitly say which accelerators you are using and
> show some benchmarks?

Sure, I was looking at the in-development tegra_aes kernel module.  As
I was trying to understand the available performance ignoring
in-kernel overhead, I wanted to push as much data as possible into the
requests at a time as I tweaked how the different pieces worked.

The difference was a change from 2.2 megabytes/s -> 3.2 megabytes/s
using a simplistic synchronous dd + drop_caches scaffold.   Without
using conv=sync, the performance difference was not as noticeable (for
obvious reasons).  A more comprehensive benchmark suite, like iozone,
would bubble up better results, though.

> Does dmcrypt work in async crypto mode or you have also
> "accelerators" like special instructions which run synchronously
> (like AES-NI)?

It was using the ablkcipher interface, but the engine itself is
synchronous, so requests were living on a queue in the driver waiting
to be serviced.  For dm-crypt and above, everything appears
asynchronous.  (Is that roughly answering the question? :)

> Of course large block means smaller overhead but the difference should
> not be significant (at least in theory).
> If it is, we need to know why - it can be because of timing or
> the way how the request are submitted not the time
> or real encryption (initialization) itself.
>
> In this case the crypto driver should be optimised first.

Agreed.  I was using this patch purely for driver optimization which
is why I didn't post a prettier one with an optional table argument,
etc.

>>  I've created a
>> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
>> the driver performance testing, but I lack the cryptographic
>> understanding to determine if there is significant exposure by
>> allowing a dm-crypt device to use a block size that exceeds the sector
>> size.
>
> As Arno said, there should be no real security problem for these block
> sizes. Basically we are just using CBC or XTS mode today.

Cool.  (Also, sorry for not replying there. I failed to subscribe in
advance of the mailing and missed the response :/ )

> For XTS-AES, definition explicitly says that data unit (= your block)
> size should not exceed 2^20 128bit blocks (128bit = AES cipher block).
> (And even here possible attacks are closely related to birthday
> bound, IOW you need to have enough blocks encrypted with the same key.)
>
> So I do not see real security problem here. But problems are elsewhere.

Thanks - nice to have that confirmed.

>> 1. Does anyone know if there will be significant exposure to the
>> plaintext if dm-crypt used larger block sizes?
>
> Should not be.
>
>> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
>> interest?
>
> Short answer would be no :-)
>
> As I said, I would like to prove first that the problem is really in block
> size and not in related problem.
>
> Now the real problems:
>
> The whole device mapper and dmcrypt works as transparent block encryption
> and we are always operating on 512B sectors.
>
> Even if device is 4k blocks, this is hidden in underlying layer and
> DM just properly aligns data and propagates limits but
> still operates on 512B sectors. (It can be ineffective for some
> IO patterns, but it works).
>
> Changing encryption block size causes device to be incompatible with other
> systems (note stacked devices, a common thing here - LVM over dmcrypt)
> and IOs. You have to generate only aligned IO of your encryption block size.
>
> (or change dmcrypt significantly)
>
> IO hints is not enough - maybe example is better here:
>
>
> Testing device (some random data there, not important)
> # dmsetup table --showkeys
> x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0
>
> Let's generate some direct IOs (to avoid page cache)

Ah! I had forgotten about direct I/O.  I was using fsync and
/proc/sys/vm/drop_caches to clear the page cache across each call.
direct is a whole other beast :/

> *Without* your patch:
>
> # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
>
> # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
>
> # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
>
> As you can see, we get the same plain data with different IO sizes.
>
> Now *with* your patch (page size is 4096):
>
> # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum
> dd: reading `/dev/mapper/x': Invalid argument
>
> # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum
> 4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1  -
>
> # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum
> 17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141  -
>
> So block size of 512B causes operation to fail (ok - IO hints).
> IO of block encryption size and multiple of encryption size returns
> apparently something different now.
>
> This is probably not what we want...
>
> (Note that I did not even tested cross-encryption-block operations.)

Hehe - not at all!

> Even if this is somehow solved, many other problems remains:
>
> - we need to extend mapping table parameters so the block size
> must be configurable (encrypted device image must be readable
> on system with different page size, I have e.g. Sparc with 8k page size.
> (This will be needed for other extensions so it is not real
> problem, just it need to be done first.)

Certainly - I definitely wouldn't want it page-size bound in general,
and I suspect that most consumers of dm-crypt would still want a
single sector block size.

> - you need to store this block size info in header,
> for LUKS it means using new LUKS header version
> (requiring parameter on commandline is dangerous - it must be enforced)

Ouch - so much for being lazy!

> I would really better to not support this yet and first try to optimize
> crypto layer such way that it can process 512B blocks more
> efficiently (of course it will not fix bad hw but it can help batching
> sector encryption, maybe suing some hints, dunno).

That makes perfect sense to me.  I'll keep using this patch for easy
testing of known (non-direct io) test loads (or give in and just add
the drivers I'm playing with to the existing crypto test module :).

Thanks for the thorough and thoughtful response!
will

[I always seem to learn something new from mails to dm-*!]
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt