On Thu, Nov 17, 2022 at 10:39 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote: > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning. I ran 2000 > > > hours on v6.0, which precedes these commits, and everything passed. > > > > > > I will run more, primarily on v6.0, but that is what I have thus far. > > > At the moment, I have some concerns about this change. > > > > OK, so I have run a total of 8000 hours on v6.0 without failure. I have > > run 4200 hours on rcu#revert_tick_dep with 15 failures. The ones I > > looked at were RCU CPU stall warnings with timer failures. > > > > This data suggests that the kernel is not yet ready for that commit > > to be reverted. > > But that branch has the three commits reverted: > > 1) tick: Detect and fix jiffies update stall > 2) timers/nohz: Last resort update jiffies on nohz_full IRQ entry* > 3) rcu: Make CPU-hotplug removal operations enable tick > > Reverting all of them is expected to fail anyway. > > What we would like to know is if reverting just 3) is fine. Because > 1) and 2) are supposed to fix the underlying issue. > > I personally didn't manage to trigger failures with just reverting 3) > after thousands hours. But it failed with reverting all of them. > > Has someone managed to trigger a failure with only 3) reverted? > Oh, a long long thread hides the history! Once it was done [1]. But I have not kept the data (IIRC, it occupied ~1GB). If the information and attachment in [1] are not enough for you, I am happy to redo the test. Thanks, Pingfan [1]: https://lore.kernel.org/all/CAFgQCTtLm-JXRyQfKo6-+P00SShVGujZGau+khmtCe1AiRodQA@xxxxxxxxxxxxxx/ > Thanks.