Hi Eric, On Sun, Oct 29, 2023 at 7:34 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > Commit d1ac3ff008fb ("dm verity: switch to using asynchronous hash > crypto API"), from Linux v4.12, made dm-verity do its hashing using the > ahash API instead of the shash API. While this added support for > hardware (off-CPU) hashing offload, it slightly hurt performance for > everyone else due to additional crypto API overhead. This API overhead > is becoming increasingly significant as I/O speeds increase and CPUs > achieve increasingly high SHA-2 speeds using native SHA-2 instructions. > > Recent crypto API patches > (https://lore.kernel.org/linux-crypto/20231022081100.123613-1-ebiggers@xxxxxxxxxx) > are reducing that overhead. However, it cannot be eliminated. > > Meanwhile, another crypto API related sub-optimality of how dm-verity > currently implements block hashing is that it always computes each hash > using multiple calls to the crypto API. The most common case is: > > 1. crypto_ahash_init() > 2. crypto_ahash_update() [salt] > 3. crypto_ahash_update() [data] > 4. crypto_ahash_final() > > With less common dm-verity settings, the update of the salt can happen > after the data, or the data can require multiple updates. > > Regardless, each call adds some API overhead. Again, that's being > reduced by recent crypto API patches, but it cannot be eliminated; each > init, update, or final step necessarily involves an indirect call to the > actual "algorithm", which is expensive on modern CPUs, especially when > mitigations for speculative execution vulnerabilities are enabled. > > A significantly more optimal sequence for the common case is to do an > import (crypto_ahash_import(), then a finup (crypto_ahash_finup()). > This results in as few as one indirect call, the one for finup. > > Implementing the shash and import+finup optimizations independently > would result in 4 code paths, which seems a bit excessive. This patch > therefore takes a slightly simpler approach. It implements both > optimizations, but only together. So, dm-verity now chooses either the > existing, fully general ahash method; or it chooses the new shash > import+finup method which is optimized for what most dm-verity users > want: CPU-based hashing with the most common dm-verity settings. > > The new method is used automatically when appropriate, i.e. when the > ahash API and shash APIs resolve to the same underlying algorithm, the > dm-verity version is not 0 (so that the salt is hashed before the data), > and the data block size is not greater than the page size. > > In benchmarks with veritysetup's default parameters (SHA-256, 4K data > and hash block sizes, 32-byte salt), which also match the parameters > that Android currently uses, this patch improves block hashing > performance by about 15% on an x86_64 system that supports the SHA-NI > instructions, or by about 5% on an arm64 system that supports the ARMv8 > SHA2 instructions. This was with CONFIG_CRYPTO_STATS disabled; an even > larger improvement can be expected if that option is enabled. That's an impressive performance improvement. Thanks for the patch! Reviewed-by: Sami Tolvanen <samitolvanen@xxxxxxxxxx> Sami