On 2025-01-15 18:29, Mikulas Patocka wrote:
Hi
The ahash interface is slower than the shash interface for synchronous
implementations, so the patch is basically slowing down the common
case.
See the upstream commit b76ad8844234bd0d394105d7d784cd05f1bf269a for an
explanation in dm-verity.
Do you have some benchmark that shows how much does it help on s390x?
So,
that we can evaluate whether the added complexity is worth the
performance
improvement or not.
Mikulas
...
Hi Mikulas,
So finally some benchmarks measured on my development system:
A LPAR on a z16 with 16 CPUs, 32G memory, with a fresh build linux
6.13.0-rc7
kernel with and without just my dm-integrity ahash patch.
For the dm-integrity format measurements I used a 16G file located in
tempfs
as the backing file for a loopback device. The backing file totally
written
with random data from /dev/urandom. The dm-integrity format command was
integritysetup format /dev/loop0 --integrity <alg> --sector-size 4096
6.13.0-rc7 with dm-integrity using shash:
sha256 Finished, time 00m09s, 15 GiB written, speed 1.8 GiB/s
hmac-sha256 Finished, time 00m09s, 15 GiB written, speed 1.7 GiB/s
6.13.0-rc7 with dm-integrity with my ahash patch:
sha256 Finished, time 00m09s, 15 GiB written, speed 1.7 GiB/s
hmac-sha256 Finished, time 00m09s, 15 GiB written, speed 1.6
GiB/s
In practice the read and write performance may be of more importance. I
set
up a 8G file located in tempfs as the backing file for a loopback device
and
dm-integrity formatted and opened it. Then I created a random file with
4G
via dd if=/dev/urandom which was located in tempfs. For the write I used
dd if=<myrandomfile> of=/dev/mapper/<dm-inintegrity-name>
oflag=direct,sync bs=4096 count=1M
to copy the 4G random into the dm-crypt-block device.
For the read I used
dd if=/dev/mapper/<dm-inintegrity-name> of=/dev/null iflag=direct,sync
bs=4096 count=1M
to copy 4G from the dm-crypt-block device to /dev/null.
6.13.0-rc7 with dm-integrity using shash:
sha256
WRITE: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 45.5 s, 94.4 MB/s
READ: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 19.2137 s, 224 MB/s
hmac-sha256
WRITE: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 45.2026 s, 95.0 MB/s
READ: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 19.2082 s, 224 MB/s
6.13.0-rc7 with dm-integrity with my ahash patch:
sha256
WRITE: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 41.5273 s, 103 MB/s
READ: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.2558 s, 264 MB/s
hmac-sha256
WRITE: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 44.063 s, 97.5 MB/s
READ: 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5381 s, 260 MB/s
I checked these results several times. They vary but always the
dm-integrity
with the ahash patch gives the better figures. I ran some measurements
with
an isolated cpu and used this cpu to pin the format or the dd task to
this
cpu. Pinning is not a good idea as very much of the work is done via
workqueues
in dm-integrity and so the communication overhead between the cpus
increases.
However, I would have expected a slight penalty with the ahash patch
like
it is to see with the dm-integrity format but read and write seem to
benefit
from this simple ahash patch. It would be very interesting how a real
asynch
implementation of dm-integrity really performs.
If someone is interested, I can share my scripts for these measurements.
Harald Freudenberger