On Mon, Oct 19, 2020 at 5:37 AM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > [..] > > > > > > I'm very likely missing something obvious somewhere. > > > > > > CPU 0 CPU 1 > > > rcu_barrier() call_rcu()/rcu_segcblist_enqueue() > > > ------------ -------- > > > > > > smp_mb(); > > > inc_len(); > > > smp_mb(); > > > queue callback; > > > for_each_possible_cpu(cpu) > > > if (!rcu_segcblist_n_cbs(&rdp->cblist)) > > > continue; > > > > > > > > invoke_callback > > > > If CPU 0 saw the enqueue of the callback (that is the CPU 1's writes to the > > segcb_list propagated to CPU 0), then it would have also seen the > > effects of the inc_len. I forced this case in my last litmus test by this > > code in P1(): > > But then I can't find to which part of rcu_barrier() this refers to. > I see the len read before anything else. > > > > > r1 = READ_ONCE(*enq); > > smp_mb(); /* barrier Just for test purpose to show that the.. */ > > /* ..rcu_barrier() saw list modification */ > > > > On the other hand, if CPU 0 did not see the enqueue, then there is really no > > issue. Since that is the same case where call_rcu() happened _after_ the > > rcu_barrier() and there's no race. rcu_barrier() does not need to wait if > > there was no callback enqueued. > > > > This is not exactly the easiest thing to explain, hence the litmus. > > Now, reading the documentation of rcu_barrier() (thanks to you!): > > Pseudo-code using rcu_barrier() is as follows: > > 1. Prevent any new RCU callbacks from being posted. > 2. Execute rcu_barrier(). > 3. Allow the module to be unloaded. > Basically, you are saying that if all CPUs agree that len == 0 henceforth (through other memory barriers), then callback enqueuing does not need a memory barrier before setting length to 0. I think that makes sense but is it worth removing the memory barrier before WRITE(len, 1) and hoping after #1, the caller would have ensured things are fine? Also I am not sure if the above is the only usecase for rcu_barrier(). > I think with point 1, it is assumed that the caller of rcu_barrier() must have > not only stopped but also sync'ed with the possible enqueuers. Correct me if I'm wrong > here. So for example if a kthread used to post the module RCU callbacks, calling kthread_stop() > does the job as it prevents from further RCU callbacks from being enqueued and it also syncs > with the kthread thanks to the completion implied by any caller of kthread_stop() which then > sees what the kthread has read and written, including RCU callbacks enqueued. So if the caller > of kthread_stop() calls rcu_barrier() right after, rcu_barrier() should see at least the len > corresponding to the last enqueue. > > cancel_work_sync() also seem to really sync as well. I'm less sure about del_timer_sync(). > > Say we have: > > expire_timers (CPU 0) CPU 1 > ------------- ----------- > detach_timer(timer) > raw_spin_unlock(&base->lock); > call_timer_fn(timer, fn, baseclk); > -> enqueue callback > //would need at least smp_wmb() here > base->running_timer = NULL; > > del_timer_sync() { > raw_spin_lock(&base->lock); > if (base->running_timer != timer) > ret = detach_if_pending(timer, base, true); > if (!timer_pending()) > return 0; > raw_spin_unlock(&base->lock); > } > //would need at least smp_rmb() here Regarding "would need at least smp_rmb.." : But the rcu_barrier() has the control dependency we discussed in last emails, between READ(len) and whatever follows the rcu_barrier(). That itself will provide the ordering right? > //although rcu_seq_start() implies a full barrier > rcu_barrier() { > // Sees rcu_segcblist_n_cbs(rdp(CPU 0)->cblist) == 0 > // So ignore it > > > But I'm sure I'm missing something obvious. That's my specialism. I could be missing something too :-/. But I'll include this patch in my next posting anyway and let us also maybe see if Paul disagrees. thanks, - Joel