On Mon, Dec 19, 2022 at 7:55 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > On Sun, Dec 18, 2022 at 8:49 PM Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > [...] > > >>>>> > > >>>>> On 2022-12-18 14:13, Joel Fernandes (Google) wrote: > > >>>>>> Hello, I believe the pre-flip memory barrier is not required. The only reason I > > >>>>>> can say to remove it, other than the possibility that it is unnecessary, is to > > >>>>>> not have extra code that does not help. However, since we are issuing a fully > > >>>>>> memory-barrier after the flip, I cannot say that it hurts to do it anyway. > > >>>>>> > > >>>>>> For this reason, please consider these patches as "informational", than a > > >>>>>> "please merge". :-) Though, feel free to consider merging if you agree! > > >>>>>> > > >>>>>> All SRCU scenarios pass with these, with 6 hours of testing. > > >>>>> > > >>>>> Hi Joel, > > >>>>> > > >>>>> Please have a look at the comments in my side-rcu implementation [1, 2]. > > >>>>> It is similar to what SRCU does (per-cpu counter based grace period > > >>>>> tracking), but implemented for userspace. The comments explain why this > > >>>>> works without the memory barrier you identify as useless in SRCU. > > >>>>> > > >>>>> Following my implementation of side-rcu, I reviewed the SRCU comments > > >>>>> and identified that the barrier "/* E */" appears to be useless. I even > > >>>>> discussed this privately with Paul E. McKenney. > > >>>>> > > >>>>> My implementation and comments go further though, and skip the period > > >>>>> "flip" entirely if the first pass observes that all readers (in both > > >>>>> periods) are quiescent. > > >>>> > > >>>> Actually in SRCU, the first pass scans only 1 index, then does the > > >>>> flip, and the second pass scans the second index. Without doing a > > >>>> flip, an index cannot be scanned for forward progress reasons because > > >>>> it is still "active". So I am curious how you can skip flip and still > > >>>> scan both indexes? I will dig more into your implementation to learn more. > > >>> > > >>> If we look at SRCU read-side: > > >>> > > >>> int __srcu_read_lock(struct srcu_struct *ssp) > > >>> { > > >>> int idx; > > >>> > > >>> idx = READ_ONCE(ssp->srcu_idx) & 0x1; > > >>> this_cpu_inc(ssp->sda->srcu_lock_count[idx]); > > >>> smp_mb(); /* B */ /* Avoid leaking the critical section. */ > > >>> return idx; > > >>> } > > >>> > > >>> If the thread is preempted for a long period of time between load of > > >>> ssp->srcu_idx and increment of srcu_lock_count[idx], this means this > > >>> thread can appear as a "new reader" for the idx period at any arbitrary > > >>> time in the future, independently of which period is the current one > > >>> within a future grace period. > > >>> > > >>> As a result, the grace period algorithm needs to inherently support the > > >>> fact that a "new reader" can appear in any of the two periods, > > >>> independently of the current period state. > > >>> > > >>> As a result, this means that while within period "0", we _need_ to allow > > >>> newly coming readers to appear as we scan period "0". > > >> > > >> Sure, it already does handle it but that is I believe it is a corner > > >> case, not the norm. > > >> > > >>> As a result, we can simply scan both periods 0/1 for reader quiescence, > > >>> even while new readers appear within those periods. > > >> > > >> I think this is a bit dangerous. Yes there is the preemption thing you > > >> mentioned above, but that is bounded since you can only have a fixed > > >> number of tasks that underwent that preemption, and it is quite rare > > >> in the sense, each reader should get preempted just after sampling idx > > >> but not incrementing lock count. > > >> > > >> However, if we scan while new readers appear (outside of the above > > >> preemption problem), we can have counter wrap causing a false match > > >> much quicker. > > >> The scan loop is: > > >> check_readers(idx) { > > >> count_all_unlocks(idx); > > >> smp_mb(); > > >> count_all_locks(idx); > > >> bool done = (locks == unlocks) > > >> if (done) { > > >> // readers are done, end scan for this idx. > > >> } else { > > >> // try again later > > >> } > > >> } > > >> > > >> So if check_readers() got preempted just after the smp_mb(), then you > > >> can have lots of tasks enter and exit the read-side critical section > > >> and increment the locks count. Eventually locks == unlocks will > > >> happen, and it is screwed. Sure this is also theoretical, but yeah > > >> that issue can be made "worse" by scanning active readers > > >> deliberately, especially when such readers can also nest arbitrarily. > > >> > > >>> As a result, flipping between periods 0/1 is just relevant for forward > > >>> progress, not for correctness. > > >> > > >> Sure, agreed, forward progress. > > > > > > Adding to the last statement "But also correctness as described above". > > > > Exactly how many entry/exit of the read-side critical section while the > > grace period is preempted do you need to trigger this ? > > It depends on how many readers are active during the preemption of the > scan code. Say the preemption happened after per-CPU unlock counts > were totalled. Then AFAICS, if there are N active readers which need > the grace period to wait, you need (2^sizeof(int) - N) number of > lock+unlock to happen. Sorry, here I meant (2^(sizeof(unsigned long) * 8) - N) which is 2^32 on 32-bit systems. thanks, - Joel > > On a 64-bit system, where 64-bit counters are used, AFAIU this need to > > be exactly 2^64 read-side critical sections. > > Yes, but what about 32-bit systems? > > > There are other synchronization algorithms such as seqlocks which are > > quite happy with much less protection against overflow (using a 32-bit > > counter even on 64-bit architectures). > > The seqlock is an interesting point. > > > For practical purposes, I suspect this issue is really just theoretical. > > I have to ask, what is the benefit of avoiding a flip and scanning > active readers? Is the issue about grace period delay or performance? > If so, it might be worth prototyping that approach and measuring using > rcutorture/rcuscale. If there is significant benefit to current > approach, then IMO it is worth exploring. > > > Or am I missing your point ? > > No, I think you are not. Let me know if I missed something. > > Thanks, > > - Joel > > > > > > > > > > > > thanks, > > > > > > - Joel > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > https://www.efficios.com > >