I was long pondering whether to reply to this or not, but sorry, I couldn't resist. On Thu, Mar 10, 2011 at 05:18:42PM -0800, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote: > You probably need to find some way > to make pcrypt (parallel crypt layer) work for dmcrypt. That may > actually give you more speedup too than your old hack because > it can balance over more cores. "my" old "hack" balances well as long as the number of stripes is equal or greater than the number of cores. And for my specific case... it's hard to balance over more than 4 cores on a Core2Quad :) > Or get a system with AES-NI -- that usually solves it too. Honi soit qui mal y pense. Of course I understand that Intel's primary goal is to sell new hardware and hence I understand that you are required to tell this to me. However, based on the AES-NI benchmarks from the linux-crypto ML, even with AES-NI it would be hard to impossible to re-gain my (non-AES-NI!) pre-.38 performance with the .38 dm-crypt parallelization approach. > Frankly I don't think it's a very interesting case, the majority > of workloads are not like that. Well, I'm not sure if we understand each other. Probably my use case is a little bit special, but that's not the point. The main point is that the .38 dm-crypt parallelization approach does kill performance on *each* RAID0-over-dm-crypt setup. A setup which, I believe, is not that uncommon as you may believe because it was the only way to spread disk-encryption over multiple CPUs until .38. Up to .37 due to the CPU-inaffinity accessing (reading or writing) one stripe in the RAID0 did always spread over min(#core, #kcryptd) cores. Now with .38 the same access will always only utilize one single core because all the chunks of the stripe are (obviously) accessed on the same core and hence either the multiple underlying kcryptds block each other now with the old approach or with dm-crypt-over-RAID0 there is only one kcryptd involved in serving one request on one core. Hence, for single requests the new approach always decreases throughput and increases latency. The latency-increase holds even for multi-process workloads. For your approach to at least match up the old one it requires min(#core, #kcryptd) parallel requests all the time assuming latency doesn't matter and disk seek time to be zero (now you tell me to get X25s, right? :)). Mario -- There are two major products that come from Berkeley: LSD and UNIX. We don't believe this to be a coincidence. -- Jeremy S. Anderson
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ dm-crypt mailing list dm-crypt@xxxxxxxx http://www.saout.de/mailman/listinfo/dm-crypt