Hi Herbert, On 7/27/22 01:17, Herbert Xu wrote: > On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote: >> Add crypto_pool - an API for allocating per-CPU array of crypto requests >> on slow-path (in sleep'able context) and to use them on a fast-path, >> which is RX/TX for net/ users (or in any other bh-disabled users). >> The design is based on the current implementations of md5sig_pool. >> >> Previously, I've suggested to add such API on TCP-AO patch submission [1], >> where Herbert kindly suggested to help with introducing new crypto API. > > What I was suggesting is modifying the actual ahash interface so > that the tfm can be shared between different key users by moving > the key into the request object. My impression here is that we're looking at different issues. 1. The necessity of allocating per-CPU ahash_requests. 2. Managing the lifetime and sharing of ahash_request between different kernel users. Removing (1) will allow saving (num_possible_cpus() - 1)*(sizeof(struct ahash_request) + crypto_ahash_reqsize(tfm)) bytes. Which would be very nice for the new fancy CPUs with hundreds of threads. For (2) many kernel users try manage it themselves, resulting in different implementations, as well as some users trying to avoid using any complication like ref counting and allocating the request only once, without freeing it until the module is unloaded. Here for example, introducing TCP-AO would result in copy'n'paste of tcp_md5sig_pool code. As well as RFC5925 for TCP-AO let user to have any supported hashing algorithms, with the requirement from RFC5926 of hmac(sha1) & aes(cmac). If a user wants more algorithms that implementation would need to be patched. I see quite a few net/ users that could use some common API for this besides TCP-MD5 and TCP-AO. That have the same pattern of allocating crypto algorithm on a slow-path (adding a key or module initialization) and using it of a fast-path, which is RX/TX. Besides of sharing and lifetime managing, those users need a temporary buffer (usually the name is `scratch'), IIUC, it is needed for async algorithms that could use some hardware accelerator instead of CPU and need to write the result anywhere, but on vmapped stack. So, here I'm trying to address (2) in order to avoid copy'n'pasting of tcp_md5sig_pool code for introduction of TCP-AO support. I've also patched tcp-md5 code to dynamically disable the static branch, which is not crypto change. There's also a chance I've misunderstood what is your proposal :-) Thanks, Dmitry