Re: [REGRESSION] Bug 216646 - having TRANSPARENT_HUGEPAGE enabled hangs some applications

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 3 Nov 2022 16:21:26 +0000

On Thu, Nov 03, 2022 at 02:51:48PM +0100, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
> 
> Matthew, I noticed a regression report in bugzilla.kernel.org. As many
> (most?) kernel developer don't keep an eye on it, I decided to forward
> it by mail. Quoting from

Thanks, Thorsten.  I had no idea this issue had been filed.  The sooner
kernel bug tracking switches to something useful like debbugs, the better.

> https://bugzilla.kernel.org/show_bug.cgi?id=216646 :
> 
> >  Mikhail Pletnev 2022-11-01 02:43:59 UTC
> > 
> > Created attachment 303112 [details]
> > dmesg error
> > 
> > After updating kernel past 5.17 (checked in 5.19, 6.06), deluge torrent client began to hang after 1-4 hours of runtime, (when under heavy load - thousands of files mmapped and read at 20+MB/s) with following message in dmesg:
> >   
> > BUG: kernel NULL pointer dereference, address: 0000000000000096
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 15 PID: 8263 Comm: Disk Not tainted 5.17.0-rc4_ap-00165-g56a4d67c264e-dirty #36
> > Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022
> > RIP: 0010:__filemap_get_folio+0x9e/0x350
> > Code: 10 e8 46 06 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28
> > RSP: 0000:ffffbe1044ad3cb0 EFLAGS: 00010246
> > RAX: 0000000000000062 RBX: 0000000000000062 RCX: 0000000000000002
> > RDX: 000000000000001c RSI: ffffbe1044ad3cc0 RDI: ffff9fca83239ff0
> > RBP: 0000000000000000 R08: ffffbe1044ad3d40 R09: 0000000000000000
> > R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000
> > R13: ffff9fcbee9efa78 R14: 000000000004285e R15: fff000003fffffff
> > FS:  00007f0a763fc640(0000) GS:ffff9fd23edc0000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000096 CR3: 0000000122c60000 CR4: 0000000000750ee0

Very interesting.  The code disassembles to:

   9:	48 3d 02 04 00 00    	cmp    $0x402,%rax
   f:	74 e2                	je     0xfffffffffffffff3
  11:	48 3d 06 04 00 00    	cmp    $0x406,%rax
  17:	74 da                	je     0xfffffffffffffff3
  19:	48 85 c0             	test   %rax,%rax
  1c:	0f 84 3e 02 00 00    	je     0x260
  22:	a8 01                	test   $0x1,%al
  24:	0f 85 40 02 00 00    	jne    0x26a
  2a:*	8b 40 34             	mov    0x34(%rax),%eax		<-- trapping instruction
  2d:	85 c0                	test   %eax,%eax
  2f:	74 c2                	je     0xfffffffffffffff3

which I recognise as this part of mapping_get_entry() (must have been
inlined into __filemap_get_folio() -- in future, it helps enormously
if you can run the trace through scripts/decode_stacktrace.sh)

        folio = xas_load(&xas);
        if (xas_retry(&xas, folio))
                goto repeat;
        /*
         * A shadow entry of a recently evicted page, or a swap entry from
         * shmem/tmpfs.  Return it without attempting to raise page count.
         */
        if (!folio || xa_is_value(folio))
                goto out;

        if (!folio_try_get_rcu(folio))
                goto repeat;

The trap happens when we attempt to load from offset 0x34 of rax
-- the refcount field of struct folio.  And rax is 0x62 instead of
being a valid pointer.  This should not be possible; 0x62 is used to
represent a "sibling entry" in the XArray that underlies the page cache.
xas_descend() checks if you hit a sibling entry, and if you did, it
loads the canonical entry instead.

The only way I can see this happening is if there's a sibling entry
pointing to another sibling entry.  What I don't know is whether your
machine is experiencing a temporary glitch in the tree (because it's RCU
protected, it might be observing a store in progress) or whether it has
a corrupted tree where one sibling entry is pointing to another and this
will be observable by any future load (until something happens to
overwrite these entries in the cache).

> > git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40

I suspect this is where your bisection went astray.  This should have
been bad and it led you to the wrong commit.