Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > For now the API only supports 2-way interleaving, as the usefulness and > practicality seems to drop off dramatically after 2. The arm64 CPUs I > tested don't support more than 2 concurrent SHA-256 hashes. On x86_64, > AMD's Zen 4 can do 4 concurrent SHA-256 hashes (at least based on a > microbenchmark of the sha256rnds2 instruction), and it's been reported > that the highest SHA-256 throughput on Intel processors comes from using > AVX512 to compute 16 hashes in parallel. However, higher interleaving > factors would involve tradeoffs such as no longer being able to cache > the round constants in registers, further increasing the code size (both > source and binary), further increasing the amount of state that users > need to keep track of, and causing there to be more "leftover" hashes. I think the lack of extensibility is the biggest problem with this API. Now I confess I too have used the magic number 2 in the lskcipher patch-set, but there I think at least it was more justifiable based on the set of algorithms we currently support. Here I think the evidence for limiting this to 2 is weak. And the amount of work to extend this beyond 2 would mean ripping this API out again. So let's get this right from the start. Rather than shoehorning this into shash, how about we add this to ahash instead where an async return is a natural part of the API? In fact, if we do it there we don't need to make any major changes to the API. You could simply add an optional flag that to the request flags to indicate that more requests will be forthcoming immediately. The algorithm could then either delay the current request if it is supported, or process it immediately as is the case now. Cheers, -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt