Hi Herbert, On 1/19/23 09:51, Herbert Xu wrote: > On Wed, Jan 18, 2023 at 09:41:08PM +0000, Dmitry Safonov wrote: >> Introduce a per-CPU pool of async crypto requests that can be used >> in bh-disabled contexts (designed with net RX/TX softirqs as users in >> mind). Allocation can sleep and is a slow-path. >> Initial implementation has only ahash as a backend and a fix-sized array >> of possible algorithms used in parallel. >> >> Signed-off-by: Dmitry Safonov <dima@xxxxxxxxxx> >> --- >> crypto/Kconfig | 3 + >> crypto/Makefile | 1 + >> crypto/crypto_pool.c | 333 ++++++++++++++++++++++++++++++++++++++++++ >> include/crypto/pool.h | 46 ++++++ >> 4 files changed, 383 insertions(+) >> create mode 100644 crypto/crypto_pool.c >> create mode 100644 include/crypto/pool.h > > I'm still nacking this. > > I'm currently working on per-request keys which should render > this unnecessary. With per-request keys you can simply do an > atomic kmalloc when you compute the hash. Adding per-request keys sounds like a real improvement to me. But that is not the same issue I'm addressing here. I'm maybe bad at describing or maybe I just don't see how per-request keys would help. Let me describe the problem I'm solving again and please feel free to correct inline or suggest alternatives. The initial need for crypto_pool comes from TCP-AO implementation that I'm pusing upstream, see RFC5925 that describes the option and the latest version of patch set is in [1]. In that patch set hashing is used in a similar way to TCP-MD5: crypto_alloc_ahash() is a slow-path in setsockopt() and the use of pre-allocated requests in fast path, TX/RX softirqs. For TCP-AO 2 algorithms are "must have" in any compliant implementation, according to RFC5926: HMAC-SHA-1-96 and AES-128-CMAC-96, other algorithms are optional. But having in mind that sha1, as you know, is not secure to collision attacks, some customers prefer to have/use stronger hashes. In other words, TCP-AO implementation needs 2+ hashing algorithms to be used in a similar manner as TCP-MD5 uses MD5 hashing. And than, I look around and I see that the same pattern (slow allocation of crypto request and usage on a fast-path with bh disabled) is used in other places over kernel: - here I convert to crypto_pool seg6_hmac & tcp-md5 - net/ipv4/ah4.c could benefit from it: currently it allocates crypto_alloc_ahash() per every connection, allocating user-specified hash algorithm with ahash = crypto_alloc_ahash(x->aalg->alg_name, 0, 0), which are not shared between each other and it doesn't provide pre-allocated temporary/scratch buffer to calculate hash, so it uses GFP_ATOMIC in ah_alloc_tmp() - net/ipv6/ah6.c is copy'n'paste of the above - net/ipv4/esp4.c and net/ipv6/esp6.c are more-or-less also copy'n'paste with crypto_alloc_aead() instead of crypto_alloc_ahash() - net/mac80211/ - another example of the same pattern, see even the comment in ieee80211_key_alloc() where the keys are allocated and the usage in net/mac80211/{rx,tx}.c with bh-disabled - net/xfrm/xfrm_ipcomp.c has its own manager for different compression algorithms that are used in quite the same fashion. The significant exception is scratch area: it's IPCOMP_SCRATCH_SIZE=65400. So, if it could be shared with other crypto users that do the same pattern (bh-disabled usage), it would save some memory. And those are just fast-grep examples from net/, looking closer it may be possible to find more potential users. So, in crypto_pool.c it's 333 lines where is a manager that let a user share pre-allocated ahash requests [comp, aead, may be added on top] inside bh-disabled section as well as share a temporary/scratch buffer. It will make it possible to remove some if not all custom managers of the very same code pattern, some of which don't even try to share pre-allocated tfms. That's why I see some value in this crypto-pool thing. If you NACK it, the alternative for TCP-AO patches would be to add just another pool into net/ipv4/tcp.c that either copies TCP-MD5 code or re-uses it. I fail to see how your per-request keys patches would provide an API alternative to this patch set. Still, users will have to manage pre-allocated tfms and buffers. I can actually see how your per-request keys would benefit *from* this patch set: it will be much easier to wire per-req keys up to crypto_pool to avoid per-CPU tfm allocation for algorithms you'll add support for. In that case you won't have to patch crypto-pool users. [1]: https://lore.kernel.org/all/20221027204347.529913-1-dima@xxxxxxxxxx/T/#u Thanks, waiting for your input, Dmitry