On Sun, Dec 18, 2022 at 7:04 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > Hi Mathieu, > > On Sun, Dec 18, 2022 at 6:38 PM Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > > > On 2022-12-18 16:30, Joel Fernandes wrote: > > > Hi Mathieu, > > > > > > On Sun, Dec 18, 2022 at 3:56 PM Mathieu Desnoyers > > > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > >> > > >> On 2022-12-18 14:13, Joel Fernandes (Google) wrote: > > >>> Hello, I believe the pre-flip memory barrier is not required. The only reason I > > >>> can say to remove it, other than the possibility that it is unnecessary, is to > > >>> not have extra code that does not help. However, since we are issuing a fully > > >>> memory-barrier after the flip, I cannot say that it hurts to do it anyway. > > >>> > > >>> For this reason, please consider these patches as "informational", than a > > >>> "please merge". :-) Though, feel free to consider merging if you agree! > > >>> > > >>> All SRCU scenarios pass with these, with 6 hours of testing. > > >> > > >> Hi Joel, > > >> > > >> Please have a look at the comments in my side-rcu implementation [1, 2]. > > >> It is similar to what SRCU does (per-cpu counter based grace period > > >> tracking), but implemented for userspace. The comments explain why this > > >> works without the memory barrier you identify as useless in SRCU. > > >> > > >> Following my implementation of side-rcu, I reviewed the SRCU comments > > >> and identified that the barrier "/* E */" appears to be useless. I even > > >> discussed this privately with Paul E. McKenney. > > >> > > >> My implementation and comments go further though, and skip the period > > >> "flip" entirely if the first pass observes that all readers (in both > > >> periods) are quiescent. > > > > > > Actually in SRCU, the first pass scans only 1 index, then does the > > > flip, and the second pass scans the second index. Without doing a > > > flip, an index cannot be scanned for forward progress reasons because > > > it is still "active". So I am curious how you can skip flip and still > > > scan both indexes? I will dig more into your implementation to learn more. > > > > If we look at SRCU read-side: > > > > int __srcu_read_lock(struct srcu_struct *ssp) > > { > > int idx; > > > > idx = READ_ONCE(ssp->srcu_idx) & 0x1; > > this_cpu_inc(ssp->sda->srcu_lock_count[idx]); > > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > > return idx; > > } > > > > If the thread is preempted for a long period of time between load of > > ssp->srcu_idx and increment of srcu_lock_count[idx], this means this > > thread can appear as a "new reader" for the idx period at any arbitrary > > time in the future, independently of which period is the current one > > within a future grace period. > > > > As a result, the grace period algorithm needs to inherently support the > > fact that a "new reader" can appear in any of the two periods, > > independently of the current period state. > > > > As a result, this means that while within period "0", we _need_ to allow > > newly coming readers to appear as we scan period "0". > > Sure, it already does handle it but that is I believe it is a corner > case, not the norm. > > > As a result, we can simply scan both periods 0/1 for reader quiescence, > > even while new readers appear within those periods. > > I think this is a bit dangerous. Yes there is the preemption thing you > mentioned above, but that is bounded since you can only have a fixed > number of tasks that underwent that preemption, and it is quite rare > in the sense, each reader should get preempted just after sampling idx > but not incrementing lock count. > > However, if we scan while new readers appear (outside of the above > preemption problem), we can have counter wrap causing a false match > much quicker. > The scan loop is: > check_readers(idx) { > count_all_unlocks(idx); > smp_mb(); > count_all_locks(idx); > bool done = (locks == unlocks) > if (done) { > // readers are done, end scan for this idx. > } else { > // try again later > } > } > > So if check_readers() got preempted just after the smp_mb(), then you > can have lots of tasks enter and exit the read-side critical section > and increment the locks count. Eventually locks == unlocks will > happen, and it is screwed. Sure this is also theoretical, but yeah > that issue can be made "worse" by scanning active readers > deliberately, especially when such readers can also nest arbitrarily. > > > As a result, flipping between periods 0/1 is just relevant for forward > > progress, not for correctness. > > Sure, agreed, forward progress. Adding to the last statement "But also correctness as described above". thanks, - Joel