Re: Is this nfsd kernel oops known?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1 Sep 2022, at 9:51, Olga Kornievskaia wrote:

On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:

On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
Hi folks,

Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.

[ 5554.769159] BUG: KASAN: null-ptr-deref in kernel_sendpage+0x60/0x220 [ 5554.770526] Read of size 8 at addr 0000000000000008 by task nfsd/2590
[ 5554.771899]

No, I haven't seen this one. I'm guessing the page pointer passed to
kernel_sendpage was probably NULL, so this may be a case where something
walked off the end of the rq_pages array?

Beyond that I can't tell much from just this stack trace. It might be
nice to see what line of code kernel_sendpage+0x60 refers to on your
kernel.

After getting debug symbols this is what gdb told me...

(gdb) l *(kernel_sendpage+0x60)
0xffffffff81cbd570 is in kernel_sendpage (./include/linux/page-flags.h:487).
482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
484 TESTCLEARFLAG(Active, active, PF_HEAD)
485 PAGEFLAG(Workingset, workingset, PF_HEAD)
486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */
490
491 /* Xen */



I just oopsed here too on 6.0-rc3, but I didn't get a vmcore. I'll get the
next one and hopefully take it apart a bit further.  My oops was on
kernel_sendpage+0x1d:

    crash> dis -lrx kernel_sendpage+0x52
    /usr/local/src/linux/net/socket.c: 3557
0xffffffff9caf0160 <kernel_sendpage>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
    /usr/local/src/linux/net/socket.c: 3558
    0xffffffff9caf0165 <kernel_sendpage+0x5>:       push   %rbx
    0xffffffff9caf0166 <kernel_sendpage+0x6>:       mov    %rdi,%rbx
    0xffffffff9caf0169 <kernel_sendpage+0x9>:       sub    $0x18,%rsp
0xffffffff9caf016d <kernel_sendpage+0xd>: mov 0x20(%rdi),%rax 0xffffffff9caf0171 <kernel_sendpage+0x11>: mov 0xa0(%rax),%r9
    0xffffffff9caf0178 <kernel_sendpage+0x18>:      test   %r9,%r9
0xffffffff9caf017b <kernel_sendpage+0x1b>: je 0xffffffff9caf01b2 <kernel_sendpage+0x52>
    /usr/local/src/linux/./include/linux/page-flags.h: 253
0xffffffff9caf017d <kernel_sendpage+0x1d>: mov 0x8(%rsi),%rax
    /usr/local/src/linux/./include/linux/page-flags.h: 255

Yes, RSI is 0.

251 static inline unsigned long _compound_head(const struct page *page)
    252 {
253     unsigned long head = READ_ONCE(page->compound_head);
    254
    255     if (unlikely(head & 1))
    256         return head - 1;
    257     return (unsigned long)page_fixed_fake_head(page);
    258 }

Hmm, maybe that's inside

kernel_sendpage ->
	sendpage_ok ->
		page_count ->
			folio_ref_count ->
				page_folio

.. and page is NULL?  That would only make sense if we used to survive
calling kernel_sendpage with bvec->bv_page = NULL, which seems unlikely.

I'll try to catch a vmcore this time, which will help me see more.

Ben




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux