Hello Krister, > -----Original Message----- > From: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> > Sent: Tuesday, November 14, 2023 11:11 PM > To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx> > Cc: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx>; intel- > gfx@xxxxxxxxxxxxxxxxxxxxx; Kurmi, Suresh Kumar > <suresh.kumar.kurmi@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>; > Miklos Szeredi <mszeredi@xxxxxxxxxx> > Subject: Re: Regression on linux-next (next-20231107) > > Hi Chaitanya, > > On Mon, Nov 13, 2023 at 06:21:57AM +0000, Borah, Chaitanya Kumar wrote: > > Hello Krister, > > > > Any luck with this? > > > > > -----Original Message----- > > > From: Borah, Chaitanya Kumar > > > Sent: Friday, November 10, 2023 9:09 AM > > > To: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> > > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Kurmi, Suresh Kumar > > > <Suresh.Kumar.Kurmi@xxxxxxxxx>; Saarinen, Jani > > > <jani.saarinen@xxxxxxxxx>; Miklos Szeredi <mszeredi@xxxxxxxxxx> > > > Subject: RE: Regression on linux-next (next-20231107) > > > > > > Hello Krister, > > > > > > > -----Original Message----- > > > > From: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> > > > > Sent: Friday, November 10, 2023 2:10 AM > > > > To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx> > > > > Cc: kjlx@xxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; > > > > Kurmi, Suresh Kumar <suresh.kumar.kurmi@xxxxxxxxx>; Saarinen, Jani > > > > <jani.saarinen@xxxxxxxxx>; Miklos Szeredi <mszeredi@xxxxxxxxxx> > > > > Subject: Re: Regression on linux-next (next-20231107) > > > > > > > > Hi Chaitanya, > > > > > > > > On Thu, Nov 09, 2023 at 05:00:09PM +0000, Borah, Chaitanya Kumar > wrote: > > > > > Hello Krister, > > > > > > > > > > Hope you are doing well. I am Chaitanya from the linux graphics > > > > > team in > > > > Intel. > > > > > > > > > > This mail is regarding a regression we are seeing in our CI > > > > > runs[1] for some > > > > machines (dg2 and adl-p) on linux-next repository. > > > > > > > > > > Since the version next-20231107 [2], we are seeing the following > > > > > error ``````````````````````````````````````````````````````````````````````````````` > > > > > <4>[ 32.015910] stack segment: 0000 [#1] PREEMPT SMP NOPTI > > > > > <4>[ 32.021048] CPU: 15 PID: 766 Comm: fusermount Not tainted > 6.6.0- > > > > next-20231107-next-20231107-g5cd631a52568+ #1 > > > > > <4>[ 32.031135] Hardware name: Intel Corporation Raptor Lake Client > > > > Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS > > > > RPLSFWI1.R00.4221.A00.2305271351 05/27/2023 > > > > > <4>[ 32.044657] RIP: 0010:fuse_evict_inode+0x61/0x150 [fuse] > > > > > ```````````````````````````````````````````````````````````````` > > > > > ```` > > > > > `` > > > > > ``````````` > > > > > > > > > > Details log can be found in [3]. > > > > > > > > > > After bisecting the tree, the following patch [4] seems to be > > > > > the first "bad" commit > > > > > > > > > > > > > > > ```````````````````````````````````````````````````````````````` > > > > > ```` > > > > > `` > > > > > ``````````````````````````````````` > > > > > 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 is the first bad commit > > > > > commit 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 > > > > > Author: Krister Johansen kjlx@xxxxxxxxxxxxxxxxxx > > > > > Date: Fri Nov 3 10:39:47 2023 -0700 > > > > > > > > > > fuse: share lookup state between submount and its parent > > > > > > > > > > Fuse submounts do not perform a lookup for the nodeid that > > > > > they > > > inherit > > > > > from their parent. Instead, the code decrements the nlookup on the > > > > > submount's fuse_inode when it is instantiated, and no forget is > > > > > performed when a submount root is evicted. > > > > > > > > > > Trouble arises when the submount's parent is evicted despite the > > > > > submount itself being in use. In this author's case, the submount > was > > > > > in a container and deatched from the initial mount namespace via a > > > > > MNT_DEATCH operation. When memory pressure triggered the > > > > > shrinker, > > > > the > > > > > inode from the parent was evicted, which triggered enough forgets > to > > > > > render the submount's nodeid invalid. > > > > > > > > > > Since submounts should still function, even if their parent goes away, > > > > > solve this problem by sharing refcounted state between the parent > and > > > > > its submount. When all of the references on this shared state reach > > > > > zero, it's safe to forget the final lookup of the fuse nodeid. > > > > > > > > > > > > > > > ```````````````````````````````````````````````````````````````` > > > > > ```` > > > > > `` > > > > > ``````````````````````````````````` > > > > > > > > > > We also verified that if we revert the patch the issue is not seen. > > > > > > > > > > Could you please check why the patch causes this regression and > > > > > provide a > > > > fix if necessary? > > > > > > > > Apologies for the inconvenience. I've reproduced the problem, > > > > tested a fix, and am in the process of preparing patches to send to > Miklos. > > > > I'll cc the people on this e-mail in that thread. > > > > > > > > > [3] > > > > > http://gfx-ci.igk.intel.com/tree/linux-next/next-20231109/bat-dg > > > > > 2-14 > > > > > /b > > > > > oot0.txt > > > > > > > > This link didn't resolve in DNS when I tried to access it. I > > > > needed to use intel- gfx-ci.01.org as the hostname instead. > > > > > > > > > > My bad. I realized it too late. Hope you found the logs. If not here they > are. > > > > > > https://intel-gfx-ci.01.org/tree/linux-next/next-20231109/bat-dg2- > > > 14/boot0.txt > > Yes, I sent Miklos a patch for this on the 9th. That was pulled into fuse/for- > next. You can either apply this patch directly: > > https://lore.kernel.org/linux-fsdevel/CAJfpegtOKLDy- > j=oi8BsT+xjFnO+Mk7=8VxSDuyi- > bxhRSGMKQ@xxxxxxxxxxxxxx/T/#m1116af8fd8428f2871d527b7fc5d6351bd6f > 199a > > Or sync with a version of linux-next that contains the fix, which should be at > least the 11/10 branch. > Thanks a lot for the fix. Issue is resolved for us now. Regards Chaitanya > Thanks, > > -K