On Sun, Dec 18, 2022 at 8:49 PM Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: [...] > >>>>> > >>>>> On 2022-12-18 14:13, Joel Fernandes (Google) wrote: > >>>>>> Hello, I believe the pre-flip memory barrier is not required. The only reason I > >>>>>> can say to remove it, other than the possibility that it is unnecessary, is to > >>>>>> not have extra code that does not help. However, since we are issuing a fully > >>>>>> memory-barrier after the flip, I cannot say that it hurts to do it anyway. > >>>>>> > >>>>>> For this reason, please consider these patches as "informational", than a > >>>>>> "please merge". :-) Though, feel free to consider merging if you agree! > >>>>>> > >>>>>> All SRCU scenarios pass with these, with 6 hours of testing. > >>>>> > >>>>> Hi Joel, > >>>>> > >>>>> Please have a look at the comments in my side-rcu implementation [1, 2]. > >>>>> It is similar to what SRCU does (per-cpu counter based grace period > >>>>> tracking), but implemented for userspace. The comments explain why this > >>>>> works without the memory barrier you identify as useless in SRCU. > >>>>> > >>>>> Following my implementation of side-rcu, I reviewed the SRCU comments > >>>>> and identified that the barrier "/* E */" appears to be useless. I even > >>>>> discussed this privately with Paul E. McKenney. > >>>>> > >>>>> My implementation and comments go further though, and skip the period > >>>>> "flip" entirely if the first pass observes that all readers (in both > >>>>> periods) are quiescent. > >>>> > >>>> Actually in SRCU, the first pass scans only 1 index, then does the > >>>> flip, and the second pass scans the second index. Without doing a > >>>> flip, an index cannot be scanned for forward progress reasons because > >>>> it is still "active". So I am curious how you can skip flip and still > >>>> scan both indexes? I will dig more into your implementation to learn more. > >>> > >>> If we look at SRCU read-side: > >>> > >>> int __srcu_read_lock(struct srcu_struct *ssp) > >>> { > >>> int idx; > >>> > >>> idx = READ_ONCE(ssp->srcu_idx) & 0x1; > >>> this_cpu_inc(ssp->sda->srcu_lock_count[idx]); > >>> smp_mb(); /* B */ /* Avoid leaking the critical section. */ > >>> return idx; > >>> } > >>> > >>> If the thread is preempted for a long period of time between load of > >>> ssp->srcu_idx and increment of srcu_lock_count[idx], this means this > >>> thread can appear as a "new reader" for the idx period at any arbitrary > >>> time in the future, independently of which period is the current one > >>> within a future grace period. > >>> > >>> As a result, the grace period algorithm needs to inherently support the > >>> fact that a "new reader" can appear in any of the two periods, > >>> independently of the current period state. > >>> > >>> As a result, this means that while within period "0", we _need_ to allow > >>> newly coming readers to appear as we scan period "0". > >> > >> Sure, it already does handle it but that is I believe it is a corner > >> case, not the norm. > >> > >>> As a result, we can simply scan both periods 0/1 for reader quiescence, > >>> even while new readers appear within those periods. > >> > >> I think this is a bit dangerous. Yes there is the preemption thing you > >> mentioned above, but that is bounded since you can only have a fixed > >> number of tasks that underwent that preemption, and it is quite rare > >> in the sense, each reader should get preempted just after sampling idx > >> but not incrementing lock count. > >> > >> However, if we scan while new readers appear (outside of the above > >> preemption problem), we can have counter wrap causing a false match > >> much quicker. > >> The scan loop is: > >> check_readers(idx) { > >> count_all_unlocks(idx); > >> smp_mb(); > >> count_all_locks(idx); > >> bool done = (locks == unlocks) > >> if (done) { > >> // readers are done, end scan for this idx. > >> } else { > >> // try again later > >> } > >> } > >> > >> So if check_readers() got preempted just after the smp_mb(), then you > >> can have lots of tasks enter and exit the read-side critical section > >> and increment the locks count. Eventually locks == unlocks will > >> happen, and it is screwed. Sure this is also theoretical, but yeah > >> that issue can be made "worse" by scanning active readers > >> deliberately, especially when such readers can also nest arbitrarily. > >> > >>> As a result, flipping between periods 0/1 is just relevant for forward > >>> progress, not for correctness. > >> > >> Sure, agreed, forward progress. > > > > Adding to the last statement "But also correctness as described above". > > Exactly how many entry/exit of the read-side critical section while the > grace period is preempted do you need to trigger this ? It depends on how many readers are active during the preemption of the scan code. Say the preemption happened after per-CPU unlock counts were totalled. Then AFAICS, if there are N active readers which need the grace period to wait, you need (2^sizeof(int) - N) number of lock+unlock to happen. > On a 64-bit system, where 64-bit counters are used, AFAIU this need to > be exactly 2^64 read-side critical sections. Yes, but what about 32-bit systems? > There are other synchronization algorithms such as seqlocks which are > quite happy with much less protection against overflow (using a 32-bit > counter even on 64-bit architectures). The seqlock is an interesting point. > For practical purposes, I suspect this issue is really just theoretical. I have to ask, what is the benefit of avoiding a flip and scanning active readers? Is the issue about grace period delay or performance? If so, it might be worth prototyping that approach and measuring using rcutorture/rcuscale. If there is significant benefit to current approach, then IMO it is worth exploring. > Or am I missing your point ? No, I think you are not. Let me know if I missed something. Thanks, - Joel > > > > > > thanks, > > > > - Joel > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >