> On Dec 20, 2022, at 1:13 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > On 2022-12-20 13:05, Joel Fernandes wrote: >> Hi Mathieu, >>> On Tue, Dec 20, 2022 at 5:00 PM Mathieu Desnoyers >>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >>> >>> On 2022-12-19 20:04, Joel Fernandes wrote: >>>> On Mon, Dec 19, 2022 at 7:55 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: >> [...] >>>>>> On a 64-bit system, where 64-bit counters are used, AFAIU this need to >>>>>> be exactly 2^64 read-side critical sections. >>>>> >>>>> Yes, but what about 32-bit systems? >>> >>> The overflow indeed happens after 2^32 increments, just like seqlock. >>> The question we need to ask is therefore: if 2^32 is good enough for >>> seqlock, why isn't it good enough for SRCU ? >> I think Paul said wrap around does happen with SRCU on 32-bit but I'll >> let him talk more about it. If 32-bit is good enough, let us also drop >> the size of the counters for 64-bit then? >>>>>> There are other synchronization algorithms such as seqlocks which are >>>>>> quite happy with much less protection against overflow (using a 32-bit >>>>>> counter even on 64-bit architectures). >>>>> >>>>> The seqlock is an interesting point. >>>>> >>>>>> For practical purposes, I suspect this issue is really just theoretical. >>>>> >>>>> I have to ask, what is the benefit of avoiding a flip and scanning >>>>> active readers? Is the issue about grace period delay or performance? >>>>> If so, it might be worth prototyping that approach and measuring using >>>>> rcutorture/rcuscale. If there is significant benefit to current >>>>> approach, then IMO it is worth exploring. >>> >>> The main benefit I expect is improved performance of the grace period >>> implementation in common cases where there are few or no readers >>> present, especially on machines with many cpus. >>> >>> It allows scanning both periods (0/1) for each cpu within the same pass, >>> therefore loading both period's unlock counters sitting in the same >>> cache line at once (improved locality), and then loading both period's >>> lock counters, also sitting in the same cache line. >>> >>> It also allows skipping the period flip entirely if there are no readers >>> present, which is an -arguably- tiny performance improvement as well. >> The issue of counter wrap aside, what if a new reader always shows up >> in the active index being scanned, then can you not delay the GP >> indefinitely? It seems like writer-starvation is possible then (sure >> it is possible also with preemption after reader-index-sampling, but >> scanning active index deliberately will make that worse). Seqlock does >> not have such writer starvation just because the writer does not care >> about what the readers are doing. > > No, it's not possible for "current index" readers to starve the g.p. with the side-rcu scheme, because the initial pass (sampling both periods) only opportunistically skips flipping the period if there happens to be no readers in both periods. > > If there are readers in the "non-current" period, the grace period waits for them. > > If there are readers in the "current" period, it flips the period and then waits for them. Ok glad you already do that, this is what I was sort of leaning at in my previous email as well, that is doing a hybrid approach. Sorry I did not know the details of your side-RCU to know you were already doing something like that. > >> That said, the approach of scanning both counters does seem attractive >> for when there are no readers, for the reasons you mentioned. Maybe a >> heuristic to count the number of readers might help? If we are not >> reader-heavy, then scan both. Otherwise, just scan the inactive ones, >> and also couple that heuristic with the number of CPUs. I am >> interested in working on such a design with you! Let us do it and >> prototype/measure. ;-) > > Considering that it would add extra complexity, I'm unsure what that extra heuristic would improve over just scanning both periods in the first pass. Makes sense, I think you indirectly implement a form of heuristic already by flipping in case scanning both was not fruitful. > I'll be happy to work with you on such a design :) I think we can borrow quite a few concepts from side-rcu for this. Please be aware that my time is limited though, as I'm currently supposed to be on vacation. :) Oh, I was more referring to after the holidays. I am also starting vacation soon and limited In cycles ;-). It is probably better to enjoy the holidays and come back to this after. I do want to finish my memory barrier studies of SRCU over the holidays since I have been deep in the hole with that already. Back to the post flip memory barrier here since I think now even that might not be needed… Cheers, - Joel > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >