Re: [PATCH v4 6/8] fsverity: improve performance by using multibuffer hashing

Ard Biesheuvel <ardb@xxxxxxxxxx> · Thu, 6 Jun 2024 08:58:47 +0200

On Thu, 6 Jun 2024 at 07:41, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Jun 05, 2024 at 10:28:01PM -0700, Eric Biggers wrote:
> >
> > With AES, interleaving would only help with non-parallelizable modes such as CBC
> > encryption.  Anyone who cares about IPsec performance should of course be using
> > AES-GCM, which is parallelizable.  Especially since my other patch
> > https://lore.kernel.org/linux-crypto/20240602222221.176625-2-ebiggers@xxxxxxxxxx/
> > is making AES-GCM twice as fast...
>
> Algorithm selection may be limited by peer capability.  For IPsec,
> if SHA is being used, then most likely CBC is also being used.
>

IPSec users relying on software crypto and authenc() and caring about
performance seems like a rather niche use case to me.

> > In any case, it seems that what you're asking for at this point is far beyond
> > the scope of this patchset.
>
> I'm more than happy to take this over if you don't wish to extend
> it beyond the storage usage cases.  According to the original Intel
> sha2-mb submission, this should result in at least a two-fold
> speed-up.
>

I'm struggling to follow this debate. Surely, if this functionality
needs to live in ahash, the shash fallbacks need to implement this
parallel scheme too, or ahash would end up just feeding the requests
into shash sequentially, defeating the purpose. It is then up to the
API client to choose between ahash or shash, just as it can today.

So Eric has a pretty strong case for his shash implementation;
kmap_local() is essentially a NOP on architectures that anyone still
cares about (unlike kmap_atomic() which still disables preemption), so
I don't have a problem with the caller relying on that in order to be
able to use shash directly. The whole scatterlist / request alloc
dance is just too tedious and pointless, given that in practice, it
all gets relegated to shash anyway.

But my point is that even if we go with Herbert's proposal for the
ahash, we'll still need something like Eric's code on the shash side.

For the true async accelerator use case, none of this should make any
difference, right? If the caller already tolerates async (but
in-order) completion, implementing this request chaining doesn't buy
it anything. So only when the caller is sync and the implementation is
async, we might be able to do something smart where the caller waits
on a single completion that signals the completion of a set of inputs.
But this is also rather niche, so not worth holding up this effort.

So Herbert, what would the ahash_to_shash plumbing look like for the
ahash API that you are proposing? What changes would it require to
shash, and how much to they deviate from what Eric is suggesting?