Hi,
On 8 March 2017 at 13:49, Binoy Jayan <binoy.jayan@xxxxxxxxxx> wrote:
> Hi Gilad,
>
>> I gave it a spin on a x86_64 with 8 CPUs with AES-NI using cryptd and
>> on Arm using CryptoCell hardware accelerator.
>>
>> There was no difference in performance between 512 and 4096 bytes
>> cluster size on the x86_64 (800 MB loop file system)
>>
>> There was an improvement in latency of 3.2% between 512 and 4096 bytes
>> cluster size on the Arm. I expect the performance benefits for this
>> test for Binoy's patch to be the same.
>>
>> In both cases the very naive test was a simple dd with block size of
>> 4096 bytes or the raw block device.
>>
>> I do not know what effect having a bigger cluster size would have on
>> have on other more complex file system operations.
>> Is there any specific benchmark worth testing with?
The multiple instances issue in /proc/crypto is fixed. It was because of
the IV code itself modifying the algorithm name inadvertently in the
global crypto algorithm lookup table when it was splitting up
"plain(cbc(aes))" into "plain" and "cbc(aes)" so as to invoke the child
algorithm.
I ran a few tests with dd, bonnie and FIO under Qemu - x86 using the
automated script [1] that I wrote to make the testing easy.
The tests were done on software implementations of the algorithms
as the real hardware was not available with me. According to the test,
I found that the sequential reads and writes have a good improvement
(5.7 %) in the data rate with the proposed solution while the random
reads shows a very little improvement. When tested with FIO, the
random writes also shows a small improvement (2.2%) but the random
reads show a little deterioration in performance (4 %).
On 8 March 2017 at 13:49, Binoy Jayan <binoy.jayan@xxxxxxxxxx> wrote:
> Hi Gilad,
>
>> I gave it a spin on a x86_64 with 8 CPUs with AES-NI using cryptd and
>> on Arm using CryptoCell hardware accelerator.
>>
>> There was no difference in performance between 512 and 4096 bytes
>> cluster size on the x86_64 (800 MB loop file system)
>>
>> There was an improvement in latency of 3.2% between 512 and 4096 bytes
>> cluster size on the Arm. I expect the performance benefits for this
>> test for Binoy's patch to be the same.
>>
>> In both cases the very naive test was a simple dd with block size of
>> 4096 bytes or the raw block device.
>>
>> I do not know what effect having a bigger cluster size would have on
>> have on other more complex file system operations.
>> Is there any specific benchmark worth testing with?
The multiple instances issue in /proc/crypto is fixed. It was because of
the IV code itself modifying the algorithm name inadvertently in the
global crypto algorithm lookup table when it was splitting up
"plain(cbc(aes))" into "plain" and "cbc(aes)" so as to invoke the child
algorithm.
I ran a few tests with dd, bonnie and FIO under Qemu - x86 using the
automated script [1] that I wrote to make the testing easy.
The tests were done on software implementations of the algorithms
as the real hardware was not available with me. According to the test,
I found that the sequential reads and writes have a good improvement
(5.7 %) in the data rate with the proposed solution while the random
reads shows a very little improvement. When tested with FIO, the
random writes also shows a small improvement (2.2%) but the random
reads show a little deterioration in performance (4 %).
When tested in arm hardware, only the sequential writes with bonnie
shows improvement (5.6%). All other tests shows degraded performance
shows improvement (5.6%). All other tests shows degraded performance
in the absence of crypto hardware.
Dependencies: dd [Full version], bonnie, fioThanks,
Binoy
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel