Re: Regression on linux-next (next-20231107)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Krister,

> -----Original Message-----
> From: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, November 14, 2023 11:11 PM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx>
> Cc: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx>; intel-
> gfx@xxxxxxxxxxxxxxxxxxxxx; Kurmi, Suresh Kumar
> <suresh.kumar.kurmi@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>;
> Miklos Szeredi <mszeredi@xxxxxxxxxx>
> Subject: Re: Regression on linux-next (next-20231107)
> 
> Hi Chaitanya,
> 
> On Mon, Nov 13, 2023 at 06:21:57AM +0000, Borah, Chaitanya Kumar wrote:
> > Hello Krister,
> >
> > Any luck with this?
> >
> > > -----Original Message-----
> > > From: Borah, Chaitanya Kumar
> > > Sent: Friday, November 10, 2023 9:09 AM
> > > To: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx>
> > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Kurmi, Suresh Kumar
> > > <Suresh.Kumar.Kurmi@xxxxxxxxx>; Saarinen, Jani
> > > <jani.saarinen@xxxxxxxxx>; Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > Subject: RE: Regression on linux-next (next-20231107)
> > >
> > > Hello Krister,
> > >
> > > > -----Original Message-----
> > > > From: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx>
> > > > Sent: Friday, November 10, 2023 2:10 AM
> > > > To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx>
> > > > Cc: kjlx@xxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx;
> > > > Kurmi, Suresh Kumar <suresh.kumar.kurmi@xxxxxxxxx>; Saarinen, Jani
> > > > <jani.saarinen@xxxxxxxxx>; Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > Subject: Re: Regression on linux-next (next-20231107)
> > > >
> > > > Hi Chaitanya,
> > > >
> > > > On Thu, Nov 09, 2023 at 05:00:09PM +0000, Borah, Chaitanya Kumar
> wrote:
> > > > > Hello Krister,
> > > > >
> > > > > Hope you are doing well. I am Chaitanya from the linux graphics
> > > > > team in
> > > > Intel.
> > > > >
> > > > > This mail is regarding a regression we are seeing in our CI
> > > > > runs[1] for some
> > > > machines (dg2 and adl-p) on linux-next  repository.
> > > > >
> > > > > Since the version next-20231107 [2], we are seeing the following
> > > > > error ```````````````````````````````````````````````````````````````````````````````
> > > > > <4>[   32.015910] stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > > > > <4>[   32.021048] CPU: 15 PID: 766 Comm: fusermount Not tainted
> 6.6.0-
> > > > next-20231107-next-20231107-g5cd631a52568+ #1
> > > > > <4>[   32.031135] Hardware name: Intel Corporation Raptor Lake Client
> > > > Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS
> > > > RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> > > > > <4>[   32.044657] RIP: 0010:fuse_evict_inode+0x61/0x150 [fuse]
> > > > > ````````````````````````````````````````````````````````````````
> > > > > ````
> > > > > ``
> > > > > ```````````
> > > > >
> > > > > Details log can be found in [3].
> > > > >
> > > > > After bisecting the tree, the following patch [4] seems to be
> > > > > the first "bad" commit
> > > > >
> > > > >
> > > > > ````````````````````````````````````````````````````````````````
> > > > > ````
> > > > > ``
> > > > > ```````````````````````````````````
> > > > > 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 is the first bad commit
> > > > > commit 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5
> > > > > Author: Krister Johansen kjlx@xxxxxxxxxxxxxxxxxx
> > > > > Date:   Fri Nov 3 10:39:47 2023 -0700
> > > > >
> > > > >     fuse: share lookup state between submount and its parent
> > > > >
> > > > >     Fuse submounts do not perform a lookup for the nodeid that
> > > > > they
> > > inherit
> > > > >     from their parent.  Instead, the code decrements the nlookup on the
> > > > >     submount's fuse_inode when it is instantiated, and no forget is
> > > > >     performed when a submount root is evicted.
> > > > >
> > > > >     Trouble arises when the submount's parent is evicted despite the
> > > > >     submount itself being in use.  In this author's case, the submount
> was
> > > > >     in a container and deatched from the initial mount namespace via a
> > > > >     MNT_DEATCH operation.  When memory pressure triggered the
> > > > > shrinker,
> > > > the
> > > > >     inode from the parent was evicted, which triggered enough forgets
> to
> > > > >     render the submount's nodeid invalid.
> > > > >
> > > > >     Since submounts should still function, even if their parent goes away,
> > > > >     solve this problem by sharing refcounted state between the parent
> and
> > > > >     its submount.  When all of the references on this shared state reach
> > > > >     zero, it's safe to forget the final lookup of the fuse nodeid.
> > > > >
> > > > >
> > > > > ````````````````````````````````````````````````````````````````
> > > > > ````
> > > > > ``
> > > > > ```````````````````````````````````
> > > > >
> > > > > We also verified that if we revert the patch the issue is not seen.
> > > > >
> > > > > Could you please check why the patch causes this regression and
> > > > > provide a
> > > > fix if necessary?
> > > >
> > > > Apologies for the inconvenience.  I've reproduced the problem,
> > > > tested a fix, and am in the process of preparing patches to send to
> Miklos.
> > > > I'll cc the people on this e-mail in that thread.
> > > >
> > > > > [3]
> > > > > http://gfx-ci.igk.intel.com/tree/linux-next/next-20231109/bat-dg
> > > > > 2-14
> > > > > /b
> > > > > oot0.txt
> > > >
> > > > This link didn't resolve in DNS when I tried to access it.  I
> > > > needed to use intel- gfx-ci.01.org as the hostname instead.
> > > >
> > >
> > > My bad. I realized it too late. Hope you found the logs. If not here they
> are.
> > >
> > > https://intel-gfx-ci.01.org/tree/linux-next/next-20231109/bat-dg2-
> > > 14/boot0.txt
> 
> Yes, I sent Miklos a patch for this on the 9th.  That was pulled into fuse/for-
> next.  You can either apply this patch directly:
> 
> https://lore.kernel.org/linux-fsdevel/CAJfpegtOKLDy-
> j=oi8BsT+xjFnO+Mk7=8VxSDuyi-
> bxhRSGMKQ@xxxxxxxxxxxxxx/T/#m1116af8fd8428f2871d527b7fc5d6351bd6f
> 199a
> 
> Or sync with a version of linux-next that contains the fix, which should be at
> least the 11/10 branch.
> 

Thanks a lot for the fix. Issue is resolved for us now.

Regards

Chaitanya

> Thanks,
> 
> -K




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux