On Sun, Jan 26, 2025 at 09:58:11PM -0500, Joel Fernandes wrote: > On Sun, Jan 26, 2025 at 9:55 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > On Sun, Jan 26, 2025 at 9:03 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Sun, Jan 26, 2025 at 08:22:23PM -0500, Joel Fernandes wrote: > > > > On Sun, Jan 26, 2025 at 8:13 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > Hi, Paul and Frederic, > > > > > > > > > > [...] > > > > > > > > On Sat, Jan 25, 2025 at 12:03:58AM +0100, Frederic Weisbecker wrote: > > > > > > > > > Le Fri, Dec 13, 2024 at 11:49:49AM -0800, Paul E. McKenney a écrit : > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > > > > > index 2f9c9272cd486..d2a91f705a4ab 100644 > > > > > > > > > > --- a/kernel/rcu/rcu.h > > > > > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > > > > > @@ -162,7 +162,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > > > > > { > > > > > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > > > > > > > > > - return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); > > > > > > > > > > + return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1)); > > > > > > > > > > > > > > [...] > > > > > > > > > The way I understand it is that rcu_state.gp_seq might be seen started while > > > > > > > > > root_rnp->gp_seq is not. So rcu_seq_snap() on the started rcu_state.gp_seq > > > > > > > > > may return maximum 2 full GPs ahead of root_rnp->gp_seq. And therefore it takes below > > > > > > > > > 2 GPs to safely deduce we wrapped around. > > > > > > > > > > > > > > > > Exactly! > > > > > > > > > > > > > > > > > Should it be ULONG_CMP_LT(cur_s, s - (2 * (RCU_SEQ_STATE_MASK + 1))) ? > > > > > > > > > > > > > > > > Quite possibly. I freely admit that I allowed a bit of slop because > > > > > > > > time was of the essence (holidays and all that) and also it does not > > > > > > > > hurt much to lose a couple of counts out of a 2^32 cycle, to say nothing > > > > > > > > of the common-case 2^64 cycle. It would not hurt to be exact, but it > > > > > > > > would be necessary to convince ourselves that we were not off by one in > > > > > > > > the wrong direction. > > > > > > > > > > > > > > > > I would be happy to see a patch, as long as it was sufficiently > > > > > > > > convincing. > > > > > > > > > > > > > > I'm not so much concerned about being exact but rather about making > > > > > > > sure we still understand what we did within one year. We can leave one > > > > > > > more grace period than what we expect out of paranoia but, the most > > > > > > > important is that we comment about what we expect and why. Let me > > > > > > > prepare a patch for that. > > > > > > > > > > > > Even better! > > > > > > > > > > Do we really have users who could pass such a huge delta wrapped > > > > > around value to poll() i.e > ULONG_MAX/2 ? For 32-bit, that would be > > > > > 2 billion count since the get() (500 million GPs on 32-bit?). I am > > > > > curious if such a scenario should be a WARN() because also: If more > > > > > than ULONG_MAX/2 values are possible after get(), then a full or > > > > > multiple ULONG_MAX wraparound is also possible. This means then both > > > > > rcu_seq_done() and rcu_seq_done_exact() become unreliable anyway for > > > > > such stale get() values. > > > > > > > > > > I do agree with both your points on the side effect of the patch to > > > > > rcu_seq_done_exact(), but I am not fully convinced myself about > > > > > utility of rcu_seq_done_exact() versus the rcu_seq_done() but I could > > > > > be missing something. > > > > > > > > I want to modify my comment on reliability. rcu_seq_done_exact() is > > > > certainly more "reliable" than rcu_seq_done() for wraparound delta > > > > > ULONG_MAX/2. Still with such a huge wrap around it is not fail proof > > > > if it lands within the "3 Grace period" window, so if it is not super > > > > reliable and if the user should not rely on it, then I wonder if it is > > > > better to not do it and instead warn the user they are playing with > > > > fire. But a counter-argument might be, landing within the 3 GP window > > > > is quite low probability... > > > > > > It is also a harmless false negative. And another poll within a few > > > hundred milliseconds will set things straight. In contrast, if we > > > used rcu_seq_done(), it would be a good long time before we got out of > > > false-negative territory. > > > > True! > > > > > On a 32-bit system, if there was an expedited grace period every 10 > > > microseconds (just barely possible on small systems), then a 32-bit > > > counter would wrap in about 12 hours. So there would be a six-hour > > > false-negative zone. > > > > > > So an expedited grace period every 10 microseconds combined with > > > a six-hour polling delay is unlikely, but not out of the realm of > > > possibility. > > > > Assuming that every 10 microsecond of the entire 6 hours is used for > > an expedited GP, but I agree with your point. > > > > > Please feel free to nominate a comment. > > > > Ok thanks, I'll see what Frederic comes up with and can take it from there. > > I forgot to ask is the reason for keeping rcu_seq_done_exact() to have > more complexity only when needed, or does it make sense to rename > rcu_seq_done_exact() as rcu_seq_done_done() and get rid of the old > rcu_seq_done()? We might well be able to use the rcu_seq_done_exact() algorithm for all the uses, but that will require very careful review and testing. Please keep in mind that there are quite a few uses, that the benefit is small, and I was lazy. ;-) The difference with the polling APIs is that a user might well legitimately hang onto a cookie for six hours, for example, by having used it to tag a data structure in a block cache or similar. Thanx, Paul