+Vlad Hi Sebastian, On Wed, Apr 15, 2020 at 06:00:32PM +0200, Sebastian Andrzej Siewior wrote: > The per-CPU variable is initialized at runtime in > kfree_rcu_batch_init(). This function is invoked before > `rcu_scheduler_active' is set to `RCU_SCHEDULER_RUNNING'. After the > initialisation, `->initialized' is to true. > > The spin_lock is only acquired if `->initialized' is set to true. The > worqueue item is only used if `rcu_scheduler_active' set to > RCU_SCHEDULER_RUNNING' which happens after initialisation. > > Use a static initializer for krc.lock and remove the runtime > initialisation of the lock. Since the lock can now be always acquired, > remove the `->initialized' check. > The local_irq_save() is removed and raw_cpu_ptr() + spin_lock_irqsave() > is used. The worst case scenario is that after raw_cpu_ptr() the code > has been moved to another CPU. This is "okay" because the data strucure > itself is protected with a lock. > Add a warning in kfree_rcu_batch_init() to ensure that this function is > invoked before `RCU_SCHEDULER_RUNNING' to ensure that the worker is not > used earlier. > > Reported-by: Mike Galbraith <efault@xxxxxx> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > --- > kernel/rcu/tree.c | 22 +++++++--------------- > 1 file changed, 7 insertions(+), 15 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index f288477ee1c26..5b0b63dd04b02 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2893,7 +2893,6 @@ struct kfree_rcu_cpu_work { > * @lock: Synchronize access to this structure > * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES > * @monitor_todo: Tracks whether a @monitor_work delayed work is pending > - * @initialized: The @lock and @rcu_work fields have been initialized > * > * This is a per-CPU structure. The reason that it is not included in > * the rcu_data structure is to permit this code to be extracted from > @@ -2908,12 +2907,13 @@ struct kfree_rcu_cpu { > spinlock_t lock; > struct delayed_work monitor_work; > bool monitor_todo; > - bool initialized; > // Number of objects for which GP not started > int count; > }; > > -static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc); > +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { > + .lock = __SPIN_LOCK_UNLOCKED(krc.lock), > +}; > > static __always_inline void > debug_rcu_head_unqueue_bulk(struct rcu_head *head) > @@ -3080,9 +3080,6 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, > { > struct kfree_rcu_bulk_data *bnode; > > - if (unlikely(!krcp->initialized)) > - return false; > - > lockdep_assert_held(&krcp->lock); > > /* Check if a new block is required. */ > @@ -3139,10 +3136,8 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > unsigned long flags; > struct kfree_rcu_cpu *krcp; > > - local_irq_save(flags); // For safely calling this_cpu_ptr(). > - krcp = this_cpu_ptr(&krc); > - if (krcp->initialized) > - spin_lock(&krcp->lock); > + krcp = raw_cpu_ptr(&krc); > + spin_lock_irqsave(&krcp->lock, flags); I agree with the patch, except for this bit. Would it be ok to split the other parts of this patch into separate patch(es)? For this part of the patch, I am wondering what is the value in it. For one, we don't want migration between sampling the value of the pointer, and actually using it. One reason at least is because we have certain resources in the per-cpu structure that take advantage of cache locality (such as the kfree_rcu_cpu::krw_arr array). In the common case, the callback would be executed on the same CPU it is queued on. All of the access is local to the CPU. I know the work item can still be handled on other CPUs so we are not strictly doing it in the worst case (however per my understanding, the work item is executed in the same CPU most of the time). Also on a slightly related note, could you share with me why this_cpu_ptr() is unsafe to call in non-preemptible context, but raw_cpu_ptr() is ok? Just curious on that. thanks, - Joel > > // Queue the object but don't yet schedule the batch. > if (debug_rcu_head_queue(head)) { > @@ -3172,9 +3167,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > } > > unlock_return: > - if (krcp->initialized) > - spin_unlock(&krcp->lock); > - local_irq_restore(flags); > + spin_unlock_irqrestore(&krcp->lock, flags); > } > EXPORT_SYMBOL_GPL(kfree_call_rcu); > > @@ -4137,17 +4130,16 @@ static void __init kfree_rcu_batch_init(void) > int cpu; > int i; > > + WARN_ON(rcu_scheduler_active == RCU_SCHEDULER_RUNNING); > for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > - spin_lock_init(&krcp->lock); > for (i = 0; i < KFREE_N_BATCHES; i++) { > INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); > krcp->krw_arr[i].krcp = krcp; > } > > INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); > - krcp->initialized = true; > } > if (register_shrinker(&kfree_rcu_shrinker)) > pr_err("Failed to register kfree_rcu() shrinker!\n"); > -- > 2.26.0 >