Re: Fw: rc6 splat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 09, 2022 at 06:44:27AM +0800, Zhouyi Zhou wrote:
> [CC RCU and Frederic and Matthew]
> 
> Hi Paul
> 
> After several days'  experiment on my Thinkpad P1 gen 4 (Intel 11800H
> 8 core 16 threads), I found that it is CONFIG_NFS_V4 that makes the
> difference.
> If I remove CONFIG_NFS_V4 from TASKS01, the probability of triggering
> the warning is significantly increased!
> 
> And, Yes, there is no CONFIG_NFS_V4 in Matthew's original email.
> 
> This is very interesting. After debugging the kernel, I found
> init_nfs_v4 does a lot of work, did it lend grace period to the test?
> I am very happy to continue to do research on this topic ;-)
> 
> Thank you for your guidance!

Thank you for chasing this down!  I am running some tests with
"--kconfig CONFIG_NFS_V4=n" to see what happens.

I am having some trouble imagining how init_nfs_v4() would help RCU Tasks
grace period go forwards, but it is a proven fact that the objective
universe has a much more capable imagination than I do.  ;-)

If left to yourself, what would be the next debugging step that you
would take?

							Thanx, Paul

> Thanks
> Zhouyi
> 
> On Tue, Mar 22, 2022 at 8:31 PM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> >
> > Dear Frederic
> >
> > I may not be right, please correct me if so
> >
> > On Tue, Mar 22, 2022 at 6:49 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Mar 21, 2022 at 08:57:46AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Mar 21, 2022 at 11:46:28PM +0800, Zhouyi Zhou wrote:
> > > > > Hi Paul and Willy
> > > > >
> > > > > I can reproduce the bug. Following is what I do:
> > > > > 1.1 git clone https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > > 1.2 cd linux
> > > > > 1.3 cp http://154.223.142.244/20220321/config-20220321 to .config
> > > > > 1.4 make vmlinux -j 16
> > > > > 1.5 kvm -smp 4 -net none     -serial file:/tmp/console.log -m 512
> > > > > -kernel vmlinux -append "console=ttyS0"
> > > > > 1.6 the /tmp/console.log is uploaded to
> > > > > http://154.223.142.244/20220321/console.log
> > > > >
> > > > > 2.1 wget https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.17-rc6.tar.gz
> > > > > 2.2 - 2.6 the result is similar.
> > > > >
> > > > > I am very interested in this topic.
> > > > > Could you please give me about a week to make a full understand of the
> > > > > meaning of the warning, and try the fixes one by one, and
> > > > > find out what happens?
> > > >
> > > > Works for me!  The eventual fix likely involves some version of Valentin
> > > > Schneider's patchset that provides APIs that allow RCU to detect the
> > > > current preemption state.  Which can change at runtime.  A prototype of
> > > > this patch is on -rcu here:
> > > >
> > > > 2436ee0b4cea ("EXP preempt/dynamic: Introduce preempt mode accessors")
> > > >
> > > > It is entirely possible that the fix might need to go to mainline sooner
> > > > rather than later.
> > >
> > > I guess it's possible to do that but that patch alone shouldn't fix anything.
> > > Also, what is the issue exactly? :-)
> > The issue is with certain kernel configuration
> > (http://154.223.142.244/20220321/config-2022032), the mainline (and
> > -rcu ) kernel  will warn
> > "call_rcu_tasks() has been failed" in rcu_tasks_verify_self_tests.
> > (http://154.223.142.244/20220321/console.log)
> >
> > Kind Regard
> > Zhouyi
> > > I can't rewind far enough the conversation.
> > >
> > > Thanks.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux