On Wed, Apr 27, 2011 at 2:19 AM, Milan Broz <mbroz@xxxxxxxxxx> wrote: > Hi, > > On 04/26/2011 05:17 PM, Will Drewry wrote: >> Recently, I've been benchmarking some different hardware crypto >> accelerators and many of them appear to be tuned toward largish >> requests (up to 16k) with a given key and a base IV. > > Please can you explicitly say which accelerators you are using and > show some benchmarks? Sure, I was looking at the in-development tegra_aes kernel module. As I was trying to understand the available performance ignoring in-kernel overhead, I wanted to push as much data as possible into the requests at a time as I tweaked how the different pieces worked. The difference was a change from 2.2 megabytes/s -> 3.2 megabytes/s using a simplistic synchronous dd + drop_caches scaffold. Without using conv=sync, the performance difference was not as noticeable (for obvious reasons). A more comprehensive benchmark suite, like iozone, would bubble up better results, though. > Does dmcrypt work in async crypto mode or you have also > "accelerators" like special instructions which run synchronously > (like AES-NI)? It was using the ablkcipher interface, but the engine itself is synchronous, so requests were living on a queue in the driver waiting to be serviced. For dm-crypt and above, everything appears asynchronous. (Is that roughly answering the question? :) > Of course large block means smaller overhead but the difference should > not be significant (at least in theory). > If it is, we need to know why - it can be because of timing or > the way how the request are submitted not the time > or real encryption (initialization) itself. > > In this case the crypto driver should be optimised first. Agreed. I was using this patch purely for driver optimization which is why I didn't post a prettier one with an optional table argument, etc. >> I've created a >> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in >> the driver performance testing, but I lack the cryptographic >> understanding to determine if there is significant exposure by >> allowing a dm-crypt device to use a block size that exceeds the sector >> size. > > As Arno said, there should be no real security problem for these block > sizes. Basically we are just using CBC or XTS mode today. Cool. (Also, sorry for not replying there. I failed to subscribe in advance of the mailing and missed the response :/ ) > For XTS-AES, definition explicitly says that data unit (= your block) > size should not exceed 2^20 128bit blocks (128bit = AES cipher block). > (And even here possible attacks are closely related to birthday > bound, IOW you need to have enough blocks encrypted with the same key.) > > So I do not see real security problem here. But problems are elsewhere. Thanks - nice to have that confirmed. >> 1. Does anyone know if there will be significant exposure to the >> plaintext if dm-crypt used larger block sizes? > > Should not be. > >> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of >> interest? > > Short answer would be no :-) > > As I said, I would like to prove first that the problem is really in block > size and not in related problem. > > Now the real problems: > > The whole device mapper and dmcrypt works as transparent block encryption > and we are always operating on 512B sectors. > > Even if device is 4k blocks, this is hidden in underlying layer and > DM just properly aligns data and propagates limits but > still operates on 512B sectors. (It can be ineffective for some > IO patterns, but it works). > > Changing encryption block size causes device to be incompatible with other > systems (note stacked devices, a common thing here - LVM over dmcrypt) > and IOs. You have to generate only aligned IO of your encryption block size. > > (or change dmcrypt significantly) > > IO hints is not enough - maybe example is better here: > > > Testing device (some random data there, not important) > # dmsetup table --showkeys > x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0 > > Let's generate some direct IOs (to avoid page cache) Ah! I had forgotten about direct I/O. I was using fsync and /proc/sys/vm/drop_caches to clear the page cache across each call. direct is a whole other beast :/ > *Without* your patch: > > # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum > eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4 - > > # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum > eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4 - > > # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum > eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4 - > > As you can see, we get the same plain data with different IO sizes. > > Now *with* your patch (page size is 4096): > > # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum > dd: reading `/dev/mapper/x': Invalid argument > > # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum > 4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1 - > > # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum > 17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141 - > > So block size of 512B causes operation to fail (ok - IO hints). > IO of block encryption size and multiple of encryption size returns > apparently something different now. > > This is probably not what we want... > > (Note that I did not even tested cross-encryption-block operations.) Hehe - not at all! > Even if this is somehow solved, many other problems remains: > > - we need to extend mapping table parameters so the block size > must be configurable (encrypted device image must be readable > on system with different page size, I have e.g. Sparc with 8k page size. > (This will be needed for other extensions so it is not real > problem, just it need to be done first.) Certainly - I definitely wouldn't want it page-size bound in general, and I suspect that most consumers of dm-crypt would still want a single sector block size. > - you need to store this block size info in header, > for LUKS it means using new LUKS header version > (requiring parameter on commandline is dangerous - it must be enforced) Ouch - so much for being lazy! > I would really better to not support this yet and first try to optimize > crypto layer such way that it can process 512B blocks more > efficiently (of course it will not fix bad hw but it can help batching > sector encryption, maybe suing some hints, dunno). That makes perfect sense to me. I'll keep using this patch for easy testing of known (non-direct io) test loads (or give in and just add the drivers I'm playing with to the existing crypto test module :). Thanks for the thorough and thoughtful response! will [I always seem to learn something new from mails to dm-*!] _______________________________________________ dm-crypt mailing list dm-crypt@xxxxxxxx http://www.saout.de/mailman/listinfo/dm-crypt