On 9/6/2022 3:11 PM, Frederic Weisbecker wrote: > On Tue, Sep 06, 2022 at 12:43:52PM -0400, Joel Fernandes wrote: >> On 9/6/2022 12:38 PM, Joel Fernandes wrote: >> Ah, now I know why I got confused. I *used* to flush the bypass list before when >> !lazy CBs showed up. Paul suggested this is overkill. In this old overkill >> method, I was missing a wake up which was likely causing the boot regression. >> Forcing a wake up fixed that. Now in v5 I make it such that I don't do the flush >> on a !lazy rate-limit. >> >> I am sorry for the confusion. Either way, in my defense this is just an extra >> bit of code that I have to delete. This code is hard. I have mostly relied on a >> test-driven development. But now thanks to this review and I am learning the >> code more and more... > > Yeah this code is hard. > > Especially as it's possible to flush from both sides and queue the timer > from both sides. And both sides read the bypass/lazy counter locklessly. > But only call_rcu_*() can queue/increase the bypass size whereas only > nocb_gp_wait() can cancel the timer. Phew! > Haha, Indeed ;-) > Among the many possible dances between rcu_nocb_try_bypass() > and nocb_gp_wait(), I haven't found a way yet for the timer to be > set to LAZY when it should be BYPASS (or other kind of accident such > as an ignored callback). > In the worst case we may arm an earlier timer than necessary > (RCU_NOCB_WAKE_BYPASS instead of RCU_NOCB_WAKE_LAZY for example). > > Famous last words... Agreed. On the issue of regressions with non-lazy things being treated as lazy, I was thinking of adding a bounded-time-check to: [PATCH v5 08/18] rcu: Add per-CB tracing for queuing, flush and invocation. Where, if a non-lazy CB takes an abnormally long time to execute (say it was subject to a race-condition), it would splat. This can be done because I am tracking the queue-time in the rcu_head in that patch. On another note, boot time regressions show up pretty quickly (at least on ChromeOS) when non-lazy things become lazy and so far with the latest code it has fortunately been pretty well behaved. Thanks, - Joel