Re: kernel 6.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2024-07-28 at 11:33 +0300, Dan Aloni wrote:
> On 2024-07-28 02:57:42, Hristo Venev wrote:
> > On Sun, 2024-07-28 at 02:34 +0200, Hristo Venev wrote:
> > > On Sun, 2024-07-21 at 16:40 +0000, Trond Myklebust wrote:
> > > > On Sun, 2024-07-21 at 14:03 +0300, Dan Aloni wrote:
> > > > > On 2024-07-16 16:09:54, Trond Myklebust wrote:
> > > > > > [..]
> > > > > > 	gdb -batch -quiet -ex 'list
> > > > > > *(nfs_folio_find_private_request+0x3c)' -ex quit nfs.ko
> > > > > > 
> > > > > > 
> > > > > > I suspect this will show that the problem is occurring
> > > > > > inside
> > > > > > the
> > > > > > function folio_get_private(), but I'd like to be sure that
> > > > > > is
> > > > > > the
> > > > > > case.
> > > > > 
> > > > > I would suspect that `->private_data` gets corrupted somehow.
> > > > > Maybe
> > > > > the folio_test_private() call needs to be protected by either
> > > > > the
> > > > > &mapping->i_private_lock, or folio lock?
> > > > > 
> > > > 
> > > > If the problem is indeed happening in "folio_get_private()",
> > > > then
> > > > the
> > > > dereferenced address value of 00000000000003a6 would seem to
> > > > indicate
> > > > that the pointer value of 'folio' itself is screwed up, doesn't
> > > > it?
> > > 
> > > The NULL dereference appears to be at the `WARN_ON_ONCE(req-
> > > >wb_head
> > > !=
> > > req);` check.
> > > 
> > > On my kernel the offset inside `nfs_folio_find_private_request`
> > > is
> > > +0x3f, but the address is again 0x3a6, meaning that `req` is for
> > > some
> > > reason set to 0x356 (the crash is on `cmp %rbp,0x50(%rbp)`).
> > 
> > ... and 0x356 happens to be NETFS_FOLIO_COPY_TO_CACHE. Maybe the
> > NETFS_RREQ_USE_PGPRIV2 flag is lost somehow?
> 

Why is netfs setting folio->private at all when it is running on top of
NFS? It doesn't own that field.

NFS uses folio->private to cache a pointer to any write requests that
are pending for that folio.

> Seems NETFS_FOLIO_COPY_TO_CACHE relates to fscache use, you are
> activating that, right?
> 
> Also in addition to my suggestion earlier, I think perhaps we need to
> use `folio_attach_private` and `folio_detach_private` instead of
> directly using `folio_set_private`, for which the NFS client seems to
> be
> the only direct user.

No. The only difference there is that folio_attach_private takes an
extra reference to the folio, which should be redundant given that
nfs_page_assign_folio() already does this for us.


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux