On 1/7/23 02:04, Jakub Kicinski wrote: > On Tue, 3 Jan 2023 18:42:54 +0000 Dmitry Safonov wrote: >> Instead of having build-time hardcoded constant, reallocate scratch >> area, if needed by user. Different algos, different users may need >> different size of temp per-CPU buffer. Only up-sizing supported for >> simplicity. > >> -static int crypto_pool_scratch_alloc(void) >> +/* Slow-path */ >> +/** >> + * crypto_pool_reserve_scratch - re-allocates scratch buffer, slow-path >> + * @size: request size for the scratch/temp buffer >> + */ >> +int crypto_pool_reserve_scratch(unsigned long size) > > Does this have to be a separate call? Can't we make it part of > the pool allocation? AFAICT the scratch gets freed when last > pool is freed, so the user needs to know to allocate the pool > _first_ otherwise there's a potential race: > > CPU 1 CPU 2 > > alloc pool > set scratch > free pool > [frees scratch] > alloc pool Yeah, I think it will be cleaner if it was an argument for crypto_pool_alloc_*() and would prevent potential misuse as you describe. Which also means that I don't have to declare crypto_pool_scratch_alloc() in patch 1, will just add a new parameter in this patch to alloc function. > >> { >> - int cpu; >> - >> - lockdep_assert_held(&cpool_mutex); >> +#define FREE_BATCH_SIZE 64 >> + void *free_batch[FREE_BATCH_SIZE]; >> + int cpu, err = 0; >> + unsigned int i = 0; >> >> + mutex_lock(&cpool_mutex); >> + if (size == scratch_size) { >> + for_each_possible_cpu(cpu) { >> + if (per_cpu(crypto_pool_scratch, cpu)) >> + continue; >> + goto allocate_scratch; >> + } >> + mutex_unlock(&cpool_mutex); >> + return 0; >> + } >> +allocate_scratch: >> + size = max(size, scratch_size); >> + cpus_read_lock(); >> for_each_possible_cpu(cpu) { >> - void *scratch = per_cpu(crypto_pool_scratch, cpu); >> + void *scratch, *old_scratch; >> >> - if (scratch) >> + scratch = kmalloc_node(size, GFP_KERNEL, cpu_to_node(cpu)); >> + if (!scratch) { >> + err = -ENOMEM; >> + break; >> + } >> + >> + old_scratch = per_cpu(crypto_pool_scratch, cpu); >> + /* Pairs with crypto_pool_get() */ >> + WRITE_ONCE(*per_cpu_ptr(&crypto_pool_scratch, cpu), scratch); > > You're using RCU for protection here, please use rcu accessors. Will do. > >> + if (!cpu_online(cpu)) { >> + kfree(old_scratch); >> continue; >> + } >> + free_batch[i++] = old_scratch; >> + if (i == FREE_BATCH_SIZE) { >> + cpus_read_unlock(); >> + synchronize_rcu(); >> + while (i > 0) >> + kfree(free_batch[--i]); >> + cpus_read_lock(); >> + } > > This is a memory allocation routine, can we simplify this by > dynamically sizing "free_batch" and using call_rcu()? > > struct humf_blah { > struct rcu_head rcu; > unsigned int cnt; > void *data[]; > }; > > cheezit = kmalloc(struct_size(blah, data, num_possible_cpus())); > > for_each .. > cheezit->data[cheezit->cnt++] = old_scratch; > > call_rcu(&cheezit->rcu, my_free_them_scratches) > > etc. > > Feels like that'd be much less locking, unlocking and general > carefully'ing. Will give it a try for v3, thanks for the idea and review, Dmitry