On Wed, Nov 01, 2023 at 12:51:15PM +0800, Herbert Xu wrote: > Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > + driver_name = crypto_ahash_driver_name(ahash); > > + if (v->version >= 1 /* salt prepended, not appended? */ && > > + 1 << v->data_dev_block_bits <= PAGE_SIZE) { > > + shash = crypto_alloc_shash(alg_name, 0, 0); > > I'm sorry but this is disgusting. > > Can't we do the same optimisation for the ahash import+finup path? That would provide the import+finup optimization, but it would retain the overhead of ahash compared to shash. That overhead includes a small amount of extra CPU time as well as the bio size being increased by sizeof(struct ahash_request). It's a small difference, but dm-verity performance is really important. Note that dm-verity also has to use an ugly and error-prone workaround to use ahash at all, since its hash blocks can be cached in vmalloc memory; see verity_hash_update(). shash handles vmalloc memory much more naturally, since no translation from vmalloc address to page to linear address is needed. I'm thinking that avoiding the extra overhead of ahash, combined with avoiding the workaround for vmalloc memory, makes it worthwhile to use shash for the import+export code path. If it was just ahash vs. shash by itself, we wouldn't bother, but the fact that we can combine the two optimizations seems attractive. > > On a side note, if we're going to use shash for bulk data then we > should reintroduce the can/cannot sleep flag. This patch limits the use of shash to blocks of at most PAGE_SIZE (*), which is the same as what ahash does internally. (*) Except when the data block size is <= PAGE_SIZE but the hash block size is > PAGE_SIZE. I think that's an uncommon configuration that's not worth worrying too much about, but it could be excluded from the shash-based code. - Eric