On Wed, Aug 25, 2021 at 9:35 AM Leonard Crestez <cdleonard@xxxxxxxxx> wrote: > > On 25.08.2021 02:34, Eric Dumazet wrote: > > On 8/24/21 2:34 PM, Leonard Crestez wrote: > >> The crypto_shash API is used in order to compute packet signatures. The > >> API comes with several unfortunate limitations: > >> > >> 1) Allocating a crypto_shash can sleep and must be done in user context. > >> 2) Packet signatures must be computed in softirq context > >> 3) Packet signatures use dynamic "traffic keys" which require exclusive > >> access to crypto_shash for crypto_setkey. > >> > >> The solution is to allocate one crypto_shash for each possible cpu for > >> each algorithm at setsockopt time. The per-cpu tfm is then borrowed from > >> softirq context, signatures are computed and the tfm is returned. > >> > > > > I could not see the per-cpu stuff that you mention in the changelog. > > That's a little embarrasing, I forgot to implement the actual per-cpu > stuff. tcp_authopt_alg_imp.tfm is meant to be an array up to NR_CPUS and > tcp_authopt_alg_get_tfm needs no locking other than preempt_disable > (which should already be the case). Well, do not use arrays of NR_CPUS and instead use normal per_cpu accessors (as in __tcp_alloc_md5sig_pool) > > The reference counting would still only happen from very few places: > setsockopt, close and openreq. This would only impact request/response > traffic and relatively little. What I meant is that __tcp_alloc_md5sig_pool() allocates stuff one time, we do not care about tcp_md5sig_pool_populated going back to false. Otherwise, a single user application constantly allocating a socket, enabling MD5 (or authopt), then closing the socket would incur a big cost on hosts with a lot of cpus. > > Performance was not a major focus so far. Preventing impact on non-AO > connections is important but typical AO usecases are long-lived > low-traffic connections. > > -- > Regards, > Leonard