On Tue, Oct 12, 2021 at 08:28:32PM -0700, Paul E. McKenney wrote: > On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote: > > On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote: > > > Hi, > > > > > > No code change in this v2, only changelogs: > > > > > > * Add tags from Valentin and Sebastian > > > > > > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin) > > > > > > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check" > > > after off-list debates with Paul. > > > > > > * Remove the scenario with softirq interrupting rcuc on > > > "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's > > > probably not possible (thanks Valentin). > > > > > > * Remove the scenario with task spent scheduling out accounted on tlimit > > > as it's not possible (thanks Valentin) > > > (see "rcu: Apply callbacks processing time limit only on softirq") > > > > > > * Fixed changelog of > > > "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread" > > > (thanks Sebastian). > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git > > > rcu/rt-v2 > > > > > > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4 > > > > > > Thanks, > > > Frederic > > > > Nice! > > > > I queued these for further review and testing. I reworked the commit log > > of 6/11 to give my idea of the reason, though I freely admit that this > > reason is not as compelling as it no doubt seemed when I wrote that code. > > But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the > WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier() > breakage. My best (but not so good) guess is a five-hour MTBF on a > dual-socket system. > > I started an automated "git bisect" with each step running 100 hours > of TREE04, but I would be surprised if anything useful comes of it. > Pleased, mind you, but surprised. Oops, trying those scenario on my side as well. Thanks!