>>>> I am trying to understand the (unlikely) case where synchronize_srcu() is done before any >>>> srcu reader lock/unlock lite call is done. Can new SRCU readers fail to observe the >>>> updates? >>> >>> If a SRCU reader fail to observe the index flip, then isn't it the case >>> that the synchronize_rcu() invoked from srcu_readers_active_idx_check() >>> must wait on it? >> >> Below is the sequence of operations I was thinking of, where at step 4 CPU2 >> reads old pointer >> >> ptr = old >> >> >> CPU1 CPU2 >> >> 1. Update ptr = new >> >> 2. synchronize_srcu() >> >> <Does not use synchronize_rcu() >> as SRCU_READ_FLAVOR_LITE is not >> set for any sdp as srcu_read_lock_lite() >> hasn't been called by any CPU> >> >> 3. srcu_read_lock_lite() >> <No smp_mb() ordering> >> >> 4. Can read ptr == old ? > > As long as the kernel was built with CONFIG_PROVE_RCU=y and given a fix > to the wrong-CPU issue you quite rightly point out below, no it cannot. > The CPU's first call to srcu_read_lock_lite() will use cmpxchg() to update > ->srcu_reader_flavor, which will place full ordering between that update > and the later read from "ptr". > > So if the synchronize_srcu() is too early to see the SRCU_READ_FLAVOR_LITE > bit, then the reader must see the new value of "ptr". Similarly, > if the reader can see the old value of "ptr", then synchronize_srcu() > must see the reader's setting of the SRCU_READ_FLAVOR_LITE bit. > > But both the CONFIG_PROVE_RCU=n and the wrong-CPU issue must be fixed > for this to work. Please see the upcoming patches to be posted as a > reply to this email. > Got it. It will be good to document how full ordering provided by cmpxchg() helps here. >>>>> + if (mask & SRCU_READ_FLAVOR_LITE && !gp) >>>>> + return false; >>>> >>>> So, srcu_readers_active_idx_check() can potentially return false for very long >>>> time, until the CPU executing srcu_readers_active_idx_check() does >>>> at least one read lock/unlock lite call? >>> >>> That is correct. The theory is that until after an srcu_read_lock_lite() >>> has executed, there is no need to wait on it. Does the practice match the >>> theory in this case, or is there some sequence of events that I missed? >> >> Below sequence >> >> CPU1 CPU2 >> 1. srcu_read_lock_lite() >> >> >> 2. srcu_read_unlock_lite() >> >> 3. synchronize_srcu() >> >> 3.1 srcu_readers_lock_idx() is >> called with gp = false as >> srcu_read_lock_lite() was never >> called on this CPU for this >> srcu_struct. So >> ssp->sda->srcu_reader_flavor is not >> set for CPU1's sda. > > Good eyes! Yes, the scan that sums the ->srcu_unlock_count[] counters > must also OR together the ->srcu_reader_flavor fields. > Agree >> 3.2 Inside srcu_readers_lock_idx() >> "mask" contains SRCU_READ_FLAVOR_LITE >> as CPU2's sdp->srcu_reader_flavor has it. >> >> 3.3 CPU1 keeps returning false from >> below check until CPU1 does at least >> one srcu_read_lock_lite() call or >> the thread migrates. >> >> if (mask & SRCU_READ_FLAVOR_LITE && !gp) >> return false; > > This is also fixed by the OR of the ->srcu_reader_flavor fields, correct? > Yes, agree. > I guess I could claim that this bug prevents the wrong-CPU bug above > from resulting in a too-short SRCU grace period, but it is of course > better to just fix the bugs. ;-) > >>>>> + return sum == unlocks; >>>>> } >>>>> >>>>> /* >>>>> @@ -473,6 +482,7 @@ static unsigned long srcu_readers_unlock_idx(struct srcu_struct *ssp, int idx) >>>>> */ >>>>> static bool srcu_readers_active_idx_check(struct srcu_struct *ssp, int idx) >>>>> { >>>>> + bool did_gp = !!(raw_cpu_read(ssp->sda->srcu_reader_flavor) & SRCU_READ_FLAVOR_LITE); >>>> >>>> sda->srcu_reader_flavor is only set when CONFIG_PROVE_RCU is enabled. But we >>>> need the reader flavor information for srcu lite variant to work. So, lite >>>> variant does not work when CONFIG_PROVE_RCU is disabled. Am I missing something >>>> obvious here? >>> >>> At first glance, it appears that I am the one who missed something obvious. >>> Including in testing, which failed to uncover this issue. >>> >>> Thank you for the careful reviews! >> >> Sure thing, no problem! > > And again, thank you! > Again, no problem! - Neeraj