Re: kernel 6.10

Hristo Venev <hristo@xxxxxxxxxx> · Sun, 28 Jul 2024 02:34:12 +0200

On Sun, 2024-07-21 at 16:40 +0000, Trond Myklebust wrote:
> On Sun, 2024-07-21 at 14:03 +0300, Dan Aloni wrote:
> > On 2024-07-16 16:09:54, Trond Myklebust wrote:
> > > [..]
> > > 	gdb -batch -quiet -ex 'list
> > > *(nfs_folio_find_private_request+0x3c)' -ex quit nfs.ko
> > > 
> > > 
> > > I suspect this will show that the problem is occurring inside the
> > > function folio_get_private(), but I'd like to be sure that is the
> > > case.
> > 
> > I would suspect that `->private_data` gets corrupted somehow. Maybe
> > the folio_test_private() call needs to be protected by either the
> > &mapping->i_private_lock, or folio lock?
> > 
> 
> If the problem is indeed happening in "folio_get_private()", then the
> dereferenced address value of 00000000000003a6 would seem to indicate
> that the pointer value of 'folio' itself is screwed up, doesn't it?

The NULL dereference appears to be at the `WARN_ON_ONCE(req->wb_head !=
req);` check.

On my kernel the offset inside `nfs_folio_find_private_request` is
+0x3f, but the address is again 0x3a6, meaning that `req` is for some
reason set to 0x356 (the crash is on `cmp %rbp,0x50(%rbp)`).

> 
> Since the value of 'folio' is being passed directly from
> write_cache_pages() as an argument to all the subsequent functions in
> the stack trace, then I'm somewhat confused.
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
> 
>