Hi Mathieu, Thanks a lot for sharing details of side-rcu, I will review it over the holidays! I agree we can/should borrow from that implementation where possible. I replied to the comments below: On Sun, Dec 18, 2022 at 4:07 PM Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > On 2022-12-18 14:13, Joel Fernandes (Google) wrote: > > The comment says that if an updater saw lock count updates, then > > ensure the reader does not see the new srcu_idx. > > > > However, there is no memory barrier between a READER reading srcu_idx > > with respect to incrementing the lock count for that srcu_idx. > > > > So what is really happening is, both "B" and "C" will order the current > > reader's unlock count update, and the _next_ readers lock count update, with > > respect to the write to the currently active index. > > > > Consider first the case of the unlock count update being seen by the UPDATER: > > > > (for brevity, the pseudocode shortens "srcu_idx" to "idx") > > > > READER UPDATER > > > > rcu_read_lock() { > > idx = READ(idx); > > lock_count[idx]++; > > > > smp_mb(); // B > > } > > srcu_flip() { > > smp_mb(); //E > > idx++; > > smp_mb(); > > } > > rcu_read_unlock() { > > smp_mb(); // C > > unlock_count[idx]++; > > } > > > > Consider that the updater saw the unlock count update, and due to this, we > > expect "E" to make sure that the reader only used the old srcu_idx. > > > > However, say the reader used the new srcu_idx because we dropped "E". That is > > totally OK because both unlock and lock counts of this reader will negate each > > other during the next scan of the srcu_idx. So we don't have to guarantee at > > all that the reader used the old srcu_idx, that does not buy us anything > > because if it used the new one, we would just ignore it during the next scan > > anyway (the reader is "done"). > > > > Now lets look at the following case: > > > > READER UPDATER > > > > rcu_read_lock() { > > idx = READ(idx); > > lock_count[idx]++; > > > > smp_mb(); // B > > } > > > > rcu_read_unlock() { > > smp_mb(); // C > > unlock_count[idx]++; > > } > > srcu_flip() { > > smp_mb(); //E > > idx++; > > rcu_read_lock() { > > idx = READ(idx); > > lock_count[idx]++; > > > > smp_mb(); // B > > smp_mb(); > > } > > } > > > > Consider that the updater saw the lock count update of the second > > rcu_read_lock(). It does not matter that we guarantee that the reader sees only > > the old srcu_idx. This is because, a reader could totally just sample > > srcu_idx, and stay preempted for long periods of time. So, during any scan, we > > already have the issue of a preempted-reader randomly springing up with a copy > > of the index which we consider the "new index". So guaranteeing that the reader > > saw the old srcu_idx instead of the new one if we saw its lock count updates, > > also does not buy us anything. > > > > Due to these reasons, drop the argument that the reader has to see a certain > > srcu_idx since we have no control over that anyway, and guaranteeing that does not > > buy us anything. > > I don't understand why this first patch only removes a comment about the > need to order things, when in fact it's the entire memory barrier /* E > */ that is useless. > > I suspect we should just remove the comment along with the barrier > without this added step. Yes, this is what I do in patch 2/2. I just wanted to break it down because the change log would be too big otherwise. There were 2 reasons given for the "E" memory barrier, so I split it into 2 patches for that reason so both patches can be discussed possibly separately. But no issues with squashing as well and merging change logs. > What SRCU fundamentally does is detect quiescence of all SRCU readers > between the beginning of the grace period and its completion. In order > to ensure forward progress, it does so in a two-phase algorithm. What > the grace period does to detect quiescence is to observe that each of > the periods (0/1) have no active reader at a given point in the grace > period. Yes agreed! By deactivating an index, it is possible to ensure new readers don't show up on the deactivated one, and so we are bound in scan time / forward progress. (Except for the special case where a reader was preempted after picking up the index, and then suddenly shows up in the scan). > Then the fact that the period is flipped to send new-coming > readers into a different period is just to ensure forward progress, and > is basically irrelevant (order-wise) with the fact that the grace period > scans both periods and validates that both periods are observed to have > no active readers. Yes. > I'd favor just clarifying the SRCU model in this way, and then remove > the useless barrier, and also implement improvements to the SRCU > algorithm and skip the "flip" entirely when we notice (early) that no > active readers are present in both periods. This is actually similar to > what I implemented in side-rcu. > Thoughts ? Agreed! Thanks so much for sharing this, and I will take a look on my side and study it. Interesting on that optimization to skip the flip! Thanks, - Joel > > Thanks, > > Mathieu > > > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > > --- > > kernel/rcu/srcutree.c | 10 ++++------ > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index 1c304fec89c0..d6a4c2439ca6 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -983,12 +983,10 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount) > > static void srcu_flip(struct srcu_struct *ssp) > > { > > /* > > - * Ensure that if this updater saw a given reader's increment > > - * from __srcu_read_lock(), that reader was using an old value > > - * of ->srcu_idx. Also ensure that if a given reader sees the > > - * new value of ->srcu_idx, this updater's earlier scans cannot > > - * have seen that reader's increments (which is OK, because this > > - * grace period need not wait on that reader). > > + * Ensure that if a given reader sees the new value of ->srcu_idx, this > > + * updater's earlier scans cannot have seen that reader's increments > > + * (which is OK, because this grace period need not wait on that > > + * reader). > > */ > > smp_mb(); /* E */ /* Pairs with B and C. */ > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >