At Thu, 9 Nov 2006 15:22:54 +0100, "Marco Costa" <costa.marco@xxxxxxxxx> wrote: > > Hi! > > On 11/9/06, Gerald Hopf <dm-crypt@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > I am currently using AES-128 encryption and am able to get about 31MB/s > > sustained performance on my single-core athlon 64 3000+ (when copying > > from the encrypted volume to a fast non-encrypted volume). > > > > Since 31MB/s is will below the limits of gigabit ethernet, i would like > > to get even more performance :-) > > It seems that you are already on the hardware limit of the harddrive. > The overhead of encrypting your filesystem is very tiny. > I may be wrong, though. ;-) That's completely wrong. You are not even close to the hardware limit, /dev/hda: Timing buffered disk reads: 168 MB in 3.03 seconds = 55.48 MB/sec (regular 7200rpm disk) Even worse, you are not even close to the processors encryption speed limit: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 76228.13k 78740.37k 81087.48k 80974.17k 81265.19k I happen to have the same processor as Gerald, and 81mb/s is the encryption speed for 128 bit key AES in CBC mode. The overall problem with dm-crypt (and cryptoloop and all stuff before) is that decryption is done by a seperate worker thread. The reason for that is that reading needs post-processing, namely decryption that can not be done in the interrupt handler that is called when the reading finishes. Hence, work is offloaded to a worker thread and that worker thread, when finished with decryption calls the I/O completion handler that usual wakes up the calling process. This additional scheduling seems to slow down the whole process sufficiently that the performance is horrible. For writing, you should get full speed because writing does not need any post processing. The grave difference can be seen in most of the performance benchmarks posted for dm-crypt. I have spent a few quaters of an hour on thinking about potential solutions. The overall goal is clear: do the postprocessing in the process that has cause the I/O, so context switching can be avoided. My idea is: 1) add a CAN_POSTPROCESS flag to I/O requests, as well as postprocess function pointer, that is initialized to a function that just does nothing. 2) Whenever dm-crypt sees a request that has the CAN_POSTPROCESS flags on it, it does not enqueue the work for the worker thread, but instead modifies the postprocessing handler to point to its own decryption routine. The decryption routine would of course call the original postprocessing handler when finished. 3) Profile the kernel code for hot spots in terms of I/O request submission. Modify this code from do_io_and_sleep(request); to request->flags |= CAN_POSTPROCESS; do_io_and_sleep(request); request->postprocess(..); That effectively offloads work to the processes causing I/O. It also prevents the additional scheduling and work queuing with dm-crypt. The approach to flag the capabilities of the caller allows seamless transition to the new model. I'm posting this idea in a total premature state. I know that, but I have no intention to implement it, as I'm done with kernel hacking. But if someone is eager to work out the details, here is at least an inspiration how it could possibly work. To answer the original question: dual-core will not improve your dm-crypt performance (at least for now) because there is only one kcryptd thread per device. This might change with the workqueue stuff that is about to be merged. I'm not following that development. -- Fruhwirth Clemens - http://clemens.endorphin.org for robots: sp4mtrap@xxxxxxxxxxxxx --------------------------------------------------------------------- dm-crypt mailing list - http://www.saout.de/misc/dm-crypt/ To unsubscribe, e-mail: dm-crypt-unsubscribe@xxxxxxxx For additional commands, e-mail: dm-crypt-help@xxxxxxxx