On Mon, Oct 04, 2021 at 09:54:32AM -0700, Darrick J. Wong wrote: > On Thu, Sep 30, 2021 at 10:59:57AM -0700, Darrick J. Wong wrote: > > On Wed, Sep 29, 2021 at 03:21:09PM +0000, Sean Christopherson wrote: > > > On Tue, Sep 28, 2021, Stephen wrote: > > > > Hello, > > > > > > > > I got this crash again on 5.14.7 in the early morning of the 27th. > > > > Things hung up shortly after I'd gone to bed. Uptime was 1 day 9 hours 9 > > > > minutes. > > > > > > ... > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000068 > > > > #PF: supervisor read access in kernel mode > > > > #PF: error_code(0x0000) - not-present page > > > > PGD 0 P4D 0 > > > > Oops: 0000 [#1] SMP NOPTI > > > > CPU: 21 PID: 8494 Comm: CPU 7/KVM Tainted: G E 5.14.7 #32 > > > > Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE WIFI/X570 > > > > AORUS ELITE WIFI, BIOS F35 07/08/2021 > > > > RIP: 0010:internal_get_user_pages_fast+0x738/0xda0 > > > > Code: 84 24 a0 00 00 00 65 48 2b 04 25 28 00 00 00 0f 85 54 06 00 00 48 > > > > 81 c4 a8 00 00 00 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 78 > > > > 68 a0 a3 > > > > > > > I haven't reproduced the crash, but the code signature (CMP against an absolute > > > address) is quite distinct, and is consistent across all three crashes. I'm pretty > > > sure the issue is that page_is_secretmem() doesn't check for a null page->mapping, > > > e.g. if the page is truncated, which IIUC can happen in parallel since gup() doesn't > > > hold the lock. > > > > > > I think this should fix the problems? > > > > > > diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h > > > index 21c3771e6a56..988528b5da43 100644 > > > --- a/include/linux/secretmem.h > > > +++ b/include/linux/secretmem.h > > > @@ -23,7 +23,7 @@ static inline bool page_is_secretmem(struct page *page) > > > mapping = (struct address_space *) > > > ((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS); > > > > > > - if (mapping != page->mapping) > > > + if (!mapping || mapping != page->mapping) > > > > I'll roll this out on my vm host and try to re-run the mass fuzztest > > overnight, though IT claims they're going to kill power to the whole > > datacenter until Monday(!)... > > ...which they did, 30 minutes after I sent this email. :( > > I'll hopefully be able to report back to the list in a day or two. Looks like everything went smoothly with the mass fuzz fstesting. I'll let you know if I see any further failures, but for now: Tested-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > --D > > > > > --D > > > > > return false; > > > > > > return mapping->a_ops == &secretmem_aops;