On Thu, 6 Jun 2024 at 07:41, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Jun 05, 2024 at 10:28:01PM -0700, Eric Biggers wrote: > > > > With AES, interleaving would only help with non-parallelizable modes such as CBC > > encryption. Anyone who cares about IPsec performance should of course be using > > AES-GCM, which is parallelizable. Especially since my other patch > > https://lore.kernel.org/linux-crypto/20240602222221.176625-2-ebiggers@xxxxxxxxxx/ > > is making AES-GCM twice as fast... > > Algorithm selection may be limited by peer capability. For IPsec, > if SHA is being used, then most likely CBC is also being used. > IPSec users relying on software crypto and authenc() and caring about performance seems like a rather niche use case to me. > > In any case, it seems that what you're asking for at this point is far beyond > > the scope of this patchset. > > I'm more than happy to take this over if you don't wish to extend > it beyond the storage usage cases. According to the original Intel > sha2-mb submission, this should result in at least a two-fold > speed-up. > I'm struggling to follow this debate. Surely, if this functionality needs to live in ahash, the shash fallbacks need to implement this parallel scheme too, or ahash would end up just feeding the requests into shash sequentially, defeating the purpose. It is then up to the API client to choose between ahash or shash, just as it can today. So Eric has a pretty strong case for his shash implementation; kmap_local() is essentially a NOP on architectures that anyone still cares about (unlike kmap_atomic() which still disables preemption), so I don't have a problem with the caller relying on that in order to be able to use shash directly. The whole scatterlist / request alloc dance is just too tedious and pointless, given that in practice, it all gets relegated to shash anyway. But my point is that even if we go with Herbert's proposal for the ahash, we'll still need something like Eric's code on the shash side. For the true async accelerator use case, none of this should make any difference, right? If the caller already tolerates async (but in-order) completion, implementing this request chaining doesn't buy it anything. So only when the caller is sync and the implementation is async, we might be able to do something smart where the caller waits on a single completion that signals the completion of a set of inputs. But this is also rather niche, so not worth holding up this effort. So Herbert, what would the ahash_to_shash plumbing look like for the ahash API that you are proposing? What changes would it require to shash, and how much to they deviate from what Eric is suggesting?