On Sat, Apr 09, 2022 at 06:44:27AM +0800, Zhouyi Zhou wrote: > [CC RCU and Frederic and Matthew] > > Hi Paul > > After several days' experiment on my Thinkpad P1 gen 4 (Intel 11800H > 8 core 16 threads), I found that it is CONFIG_NFS_V4 that makes the > difference. > If I remove CONFIG_NFS_V4 from TASKS01, the probability of triggering > the warning is significantly increased! > > And, Yes, there is no CONFIG_NFS_V4 in Matthew's original email. > > This is very interesting. After debugging the kernel, I found > init_nfs_v4 does a lot of work, did it lend grace period to the test? > I am very happy to continue to do research on this topic ;-) > > Thank you for your guidance! Thank you for chasing this down! I am running some tests with "--kconfig CONFIG_NFS_V4=n" to see what happens. I am having some trouble imagining how init_nfs_v4() would help RCU Tasks grace period go forwards, but it is a proven fact that the objective universe has a much more capable imagination than I do. ;-) If left to yourself, what would be the next debugging step that you would take? Thanx, Paul > Thanks > Zhouyi > > On Tue, Mar 22, 2022 at 8:31 PM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote: > > > > Dear Frederic > > > > I may not be right, please correct me if so > > > > On Tue, Mar 22, 2022 at 6:49 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > > > > > On Mon, Mar 21, 2022 at 08:57:46AM -0700, Paul E. McKenney wrote: > > > > On Mon, Mar 21, 2022 at 11:46:28PM +0800, Zhouyi Zhou wrote: > > > > > Hi Paul and Willy > > > > > > > > > > I can reproduce the bug. Following is what I do: > > > > > 1.1 git clone https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git > > > > > 1.2 cd linux > > > > > 1.3 cp http://154.223.142.244/20220321/config-20220321 to .config > > > > > 1.4 make vmlinux -j 16 > > > > > 1.5 kvm -smp 4 -net none -serial file:/tmp/console.log -m 512 > > > > > -kernel vmlinux -append "console=ttyS0" > > > > > 1.6 the /tmp/console.log is uploaded to > > > > > http://154.223.142.244/20220321/console.log > > > > > > > > > > 2.1 wget https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.17-rc6.tar.gz > > > > > 2.2 - 2.6 the result is similar. > > > > > > > > > > I am very interested in this topic. > > > > > Could you please give me about a week to make a full understand of the > > > > > meaning of the warning, and try the fixes one by one, and > > > > > find out what happens? > > > > > > > > Works for me! The eventual fix likely involves some version of Valentin > > > > Schneider's patchset that provides APIs that allow RCU to detect the > > > > current preemption state. Which can change at runtime. A prototype of > > > > this patch is on -rcu here: > > > > > > > > 2436ee0b4cea ("EXP preempt/dynamic: Introduce preempt mode accessors") > > > > > > > > It is entirely possible that the fix might need to go to mainline sooner > > > > rather than later. > > > > > > I guess it's possible to do that but that patch alone shouldn't fix anything. > > > Also, what is the issue exactly? :-) > > The issue is with certain kernel configuration > > (http://154.223.142.244/20220321/config-2022032), the mainline (and > > -rcu ) kernel will warn > > "call_rcu_tasks() has been failed" in rcu_tasks_verify_self_tests. > > (http://154.223.142.244/20220321/console.log) > > > > Kind Regard > > Zhouyi > > > I can't rewind far enough the conversation. > > > > > > Thanks.