On Tue, Jun 11, 2024 at 05:46:01PM +0200, Ard Biesheuvel wrote: > > The issue here is that the CPU based multibuffer approach has rather > tight constraints in terms of input length and the shared prefix, and > so designing a more generic API based on ahash doesn't help at all. > The intel multibuffer code went off into the weeds entirely attempting > to apply this parallel scheme to arbitrary combinations of inputs, so > this is something we know we should avoid. The sha-mb approach failed because it failed to aggregate the data properly. By driving this from the data sink, it was doomed to fail. The correct way to aggregate data is to do it at the source. The user (of the Crypto API) knows exactlty how much data they want to hash and how it's structured. They should be supplying that info to the API so it can use multi-buffer where applicable. Even where multi-buffer isn't available, they would at least benefit from making a single indirect call into the Crypto stack instead of N calls. When N is large (which is almost always the case for TCP) this produces a non-trivial saving. Sure I understand that you guys are more than happy with N=2 but please let me at least try this out and see if we could make this work for a large value of N. Cheers, -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt