On Tue, Nov 03, 2020 at 12:54:22PM -0500, Joel Fernandes wrote: > On Thu, Oct 29, 2020 at 05:50:04PM +0100, Uladzislau Rezki (Sony) wrote: > > The current memmory-allocation interface presents to following > > difficulties that this patch is designed to overcome > [...] > > --- > > kernel/rcu/tree.c | 109 ++++++++++++++++++++++++++++------------------ > > 1 file changed, 66 insertions(+), 43 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 06895ef85d69..f2da2a1cc716 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -177,7 +177,7 @@ module_param(rcu_unlock_delay, int, 0444); > > * per-CPU. Object size is equal to one page. This value > > * can be changed at boot time. > > */ > > -static int rcu_min_cached_objs = 2; > > +static int rcu_min_cached_objs = 5; > > module_param(rcu_min_cached_objs, int, 0444); > > > > /* Retrieve RCU kthreads priority for rcutorture */ > > @@ -3084,6 +3084,9 @@ struct kfree_rcu_cpu_work { > > * In order to save some per-cpu space the list is singular. > > * Even though it is lockless an access has to be protected by the > > * per-cpu lock. > > + * @page_cache_work: A work to refill the cache when it is empty > > + * @work_in_progress: Indicates that page_cache_work is running > > + * @hrtimer: A hrtimer for scheduling a page_cache_work > > * @nr_bkv_objs: number of allocated objects at @bkvcache. > > * > > * This is a per-CPU structure. The reason that it is not included in > > @@ -3100,6 +3103,11 @@ struct kfree_rcu_cpu { > > bool monitor_todo; > > bool initialized; > > int count; > > + > > + struct work_struct page_cache_work; > > + atomic_t work_in_progress; > > Does it need to be atomic? run_page_cache_work() is only called under a lock. > You can use xchg() there. And when you do the atomic_set, you can use > WRITE_ONCE as it is a data-race. > We can use xchg together with *_ONCE() macro. Could you please clarify what is your concern about using atomic_t? Both xchg() and atomic_xchg() guarantee atamarity. Same as WRITE_ONCE() or atomic_set(). > > @@ -4449,24 +4482,14 @@ static void __init kfree_rcu_batch_init(void) > > > > for_each_possible_cpu(cpu) { > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > - struct kvfree_rcu_bulk_data *bnode; > > > > for (i = 0; i < KFREE_N_BATCHES; i++) { > > INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); > > krcp->krw_arr[i].krcp = krcp; > > } > > > > - for (i = 0; i < rcu_min_cached_objs; i++) { > > - bnode = (struct kvfree_rcu_bulk_data *) > > - __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > > - > > - if (bnode) > > - put_cached_bnode(krcp, bnode); > > - else > > - pr_err("Failed to preallocate for %d CPU!\n", cpu); > > - } > > - > > INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); > > + INIT_WORK(&krcp->page_cache_work, fill_page_cache_func); > > krcp->initialized = true; > > During initialization, is it not better to still pre-allocate? That way you > don't have to wait to get into a situation where you need to initially > allocate. > Since we have a worker that does it when a cache is empty there is no a high need in doing it during initialization phase. If we can reduce an amount of code it is always good :) Thanks, Joel. -- Vlad Rezki