On Wed, Dec 14, 2022 at 08:34:03PM -0500, Joel Fernandes wrote: > On Wed, Dec 14, 2022 at 7:04 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Wed, Dec 14, 2022 at 11:14:48PM +0000, Joel Fernandes wrote: > > > On Wed, Dec 14, 2022 at 11:10 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Dec 14, 2022 at 11:07 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Wed, Dec 14, 2022 at 9:24 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > I also did not get why you care about readers that come and ago (you > > > > > > > mentioned the first reader seeing incorrect idx and the second reader > > > > > > > seeing the right flipped one, etc). Those readers are irrelevant > > > > > > > AFAICS since they came and went, and need not be waited on , right?. > > > > > > > > > > > > The comment is attempting to show (among other things) that we don't > > > > > > need to care about readers that come and go more than twice during that > > > > > > critical interval of time during the counter scans. > > > > > > > > > > Why do we need to care about readers that come and go even once? Once > > > > > they are gone, they have already done an unlock() and their RSCS is > > > > > over, so they need to be considered AFAICS. > > > > > > > > > > > > > Aargh, I meant: "so they need to be considered AFAICS". > > > > > > Trying again: "so they need not be considered AFAICS". > > > > Give or take counter wrap, which can make it appear that still-present > > readers have finished. > > Ah you mean those flood of readers affect the counter wrapping and not > that those readers have to be waited on or anything, they just happen > to have a side-effect on *existing readers* which need to be waited > on. Exactly, that flood of readers could potentially result in a counter-wrap-induced too-short SRCU grace period, and it is SRCU's job to avoid that, and specifically the job of the code that this comment lives in. > Thanks a lot for this explanation, this part I agree. Readers that > sampled the idx before the flip happened, and then did their > lock+unlock counter increments both after the flip, and after the > second unlock counter scan (second scan), can mess up the lock > counters such that the second scan found lock==unlock, even though it > is not to be due to pre-existing readers. But as you pointed out, > there have to be a substantially large number of these to cause the > equality check to match. This might be another reason why it is > important to scan the unlocks first, because the locks are what have > to cause the wrap around of the lock counter. Instead if you counted > locks first, then the unlocks would have to do the catching up to the > locks which are much fewer than a full wrap around. True enough! > I still don't see why this affects only the first reader. There could > be more than 1 reader that needs to be waited on (the readers that > started before the grace period started). Say there are 5 of them. > When the grace period starts, the interfering readers (2^32 of them or > so) could have sampled the old idx before the flip, and then do > lock+unlock (on that old pre-flip() idx) in quick succession after the > smp_mb() in the second srcu_readers_active_idx_check(). That causes > those 5 poor readers to not be waited on. Granted, any new readers > after this thundering herd should see the new idx and will not be > affected, thanks to the memory barriers. Yes, there could be quite a few such readers, but it only takes one messed-up reader to constitute a bug in SRCU. ;-) > Still confused, but hey I'll take it little at a time ;-) Also thanks > for the suggestions for litmus tests. Agreed, setting this sort of thing aside for a bit and then coming back to it can be helpful. Thanx, Paul > Cheers, > > - Joel > > > > Anyway, my 1 year old son is sick so signing off for now. Thanks. > > > > Ouch! I hope he recovers quickly and completely!!! > > > > Thanx, Paul