On 1 Sep 2022, at 9:51, Olga Kornievskaia wrote:
On Tue, Aug 30, 2022 at 1:49 PM Jeff Layton <jlayton@xxxxxxxxxx>
wrote:
On Tue, 2022-08-30 at 13:14 -0400, Olga Kornievskaia wrote:
Hi folks,
Is this a known nfsd kernel oops in 6.0-rc1. Was running xfstests on
pre-rhel-9.1 client against 6.0-rc1 server when it panic-ed.
[ 5554.769159] BUG: KASAN: null-ptr-deref in
kernel_sendpage+0x60/0x220
[ 5554.770526] Read of size 8 at addr 0000000000000008 by task
nfsd/2590
[ 5554.771899]
No, I haven't seen this one. I'm guessing the page pointer passed to
kernel_sendpage was probably NULL, so this may be a case where
something
walked off the end of the rq_pages array?
Beyond that I can't tell much from just this stack trace. It might be
nice to see what line of code kernel_sendpage+0x60 refers to on your
kernel.
After getting debug symbols this is what gdb told me...
(gdb) l *(kernel_sendpage+0x60)
0xffffffff81cbd570 is in kernel_sendpage
(./include/linux/page-flags.h:487).
482 TESTCLEARFLAG(LRU, lru, PF_HEAD)
483 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active,
PF_HEAD)
484 TESTCLEARFLAG(Active, active, PF_HEAD)
485 PAGEFLAG(Workingset, workingset, PF_HEAD)
486 TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
487 __PAGEFLAG(Slab, slab, PF_NO_TAIL)
488 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
489 PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some
filesystems */
490
491 /* Xen */
I just oopsed here too on 6.0-rc3, but I didn't get a vmcore. I'll get
the
next one and hopefully take it apart a bit further. My oops was on
kernel_sendpage+0x1d:
crash> dis -lrx kernel_sendpage+0x52
/usr/local/src/linux/net/socket.c: 3557
0xffffffff9caf0160 <kernel_sendpage>: nopl 0x0(%rax,%rax,1)
[FTRACE NOP]
/usr/local/src/linux/net/socket.c: 3558
0xffffffff9caf0165 <kernel_sendpage+0x5>: push %rbx
0xffffffff9caf0166 <kernel_sendpage+0x6>: mov %rdi,%rbx
0xffffffff9caf0169 <kernel_sendpage+0x9>: sub $0x18,%rsp
0xffffffff9caf016d <kernel_sendpage+0xd>: mov
0x20(%rdi),%rax
0xffffffff9caf0171 <kernel_sendpage+0x11>: mov
0xa0(%rax),%r9
0xffffffff9caf0178 <kernel_sendpage+0x18>: test %r9,%r9
0xffffffff9caf017b <kernel_sendpage+0x1b>: je
0xffffffff9caf01b2 <kernel_sendpage+0x52>
/usr/local/src/linux/./include/linux/page-flags.h: 253
0xffffffff9caf017d <kernel_sendpage+0x1d>: mov
0x8(%rsi),%rax
/usr/local/src/linux/./include/linux/page-flags.h: 255
Yes, RSI is 0.
251 static inline unsigned long _compound_head(const struct page
*page)
252 {
253 unsigned long head = READ_ONCE(page->compound_head);
254
255 if (unlikely(head & 1))
256 return head - 1;
257 return (unsigned long)page_fixed_fake_head(page);
258 }
Hmm, maybe that's inside
kernel_sendpage ->
sendpage_ok ->
page_count ->
folio_ref_count ->
page_folio
.. and page is NULL? That would only make sense if we used to survive
calling kernel_sendpage with bvec->bv_page = NULL, which seems unlikely.
I'll try to catch a vmcore this time, which will help me see more.
Ben