Re: BUG: kernel NULL pointer dereference, address: 0000000000000042

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2023-02-09 at 21:50 +0000, Matthew Wilcox wrote:
On Fri, Feb 10, 2023 at 08:30:02AM +1100, Dave Chinner wrote:
[cc willy, linux-mm, as it crashed walking the page cache in the
generic fault code]

I've seen this one occasionally, and I'm not sure what's going on.
I've never been able to reproduce it myself, and it seems to disappear
for the people who have been able to reproduce it ;-(

It is 100% my fault and definitely caused by large folios.  In the
XArray, large folios are represented by a folio pointer in the lowest
index occupied by that folio and sibling entries in every other index,
which redirect lookups to the canonical (ie lowest) entry.  This 0x42
that you've managed to find in the XArray is a sibling entry.  It
says that the entry we're actually looking for is at offset 0x10 of
the node we're in. 

Something similar was fixed in commit 63b1898fffcd, but that was a
sibling entry that ended up pointing to a node.  You've *presumably*
hit some kind of temporary situation where the original sibling entry is no
longer pointing to the folio entry that it should be.  However, there's
another possibility, which is that this is not a temporary RCU-induced
state, but we have corruption in the tree.  If we do have corruption,
then you'll see an infinite loop instead of a crash.

If it's a temporary situation, this will fix it.


I'm unfortunately not in a position to test a fix. 

diff --git a/lib/xarray.c b/lib/xarray.c
index ea9ce1f0b386..4237a9647a6a 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -207,7 +207,8 @@ static void *xas_descend(struct xa_state *xas, struct xa_node *node)
        if (xa_is_sibling(entry)) {
                offset = xa_to_sibling(entry);
                entry = xa_entry(xas->xa, node, offset);
-               if (node->shift && xa_is_node(entry))
+               if (xa_is_sibling(entry) ||
+                   (node->shift && xa_is_node(entry)))
                        entry = XA_RETRY_ENTRY;
        }
 
Please do let me know ... you say it's happened twice, but how many
machine-hours did it take to hit twice?


That's hard to say. There are ~5 machines doing this work, the kernel was installed in early February, so around 1000 machine-hours, but what part of the time they were busy and how much of that they were running the triggering workload, I can't say.


On Thu, Feb 09, 2023 at 10:43:10AM +0200, Avi Kivity wrote:
Workload: compilation and running unit tests. The task that crashed is
a unit test.

Kernel: 6.1.8-200.fc37.x86_64

Previously known stable on 5.8.9-200.fc32.x86_64. Two crashes seen so
far.


Feb  7 17:19:33 localhost kernel: BUG: kernel NULL pointer dereference,
address: 0000000000000042
Feb  7 17:19:33 localhost kernel: #PF: supervisor read access in kernel
mode
Feb  7 17:19:33 localhost kernel: #PF: error_code(0x0000) - not-present
page
Feb  7 17:19:33 localhost kernel: PGD 80000001cbb1f067 P4D
80000001cbb1f067 PUD 9cbb75067 PMD 0
Feb  7 17:19:33 localhost kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Feb  7 17:19:33 localhost kernel: CPU: 24 PID: 3718328 Comm:
transport_test Tainted: G S                 6.1.8-200.fc37.x86_64 #1
Feb  7 17:19:33 localhost kernel: Hardware name: Dell Inc. PowerEdge
R730/0599V5, BIOS 2.9.1 12/04/2018
Feb  7 17:19:33 localhost kernel: RIP:
0010:next_uptodate_page+0x46/0x200
Feb  7 17:19:33 localhost kernel: Code: 0f 84 3f 01 00 00 48 81 ff 06
04 00 00 0f 84 b3 00 00 00 48 81 ff 02 04 00 00 0f 84 37 01 00 00 40 f6
c7 01 0f 85 9c 00 00 00 <48> 8b 07 a8 01 0f 85 91 00 00 00 8b 47 34 85
c0 0f 84 86 00 00 00
Feb  7 17:19:33 localhost kernel: RSP: 0000:ffffa83e4ed67cc8 EFLAGS:
00010246
Feb  7 17:19:33 localhost kernel: RAX: 0000000000000042 RBX:
ffffa83e4ed67e00 RCX: 000000000000146e
Feb  7 17:19:33 localhost kernel: RDX: ffffa83e4ed67d20 RSI:
ffff94a9046316b0 RDI: 0000000000000042
Feb  7 17:19:33 localhost kernel: RBP: ffffa83e4ed67d20 R08:
000000000000146e R09: 0000000000dfd000
Feb  7 17:19:33 localhost kernel: R10: 000000000000145f R11:
ffff94978b85960c R12: ffff94a9046316b0
Feb  7 17:19:33 localhost kernel: R13: 000000000000146e R14:
ffff94a9046316b0 R15: ffff948f8bb1f000
Feb  7 17:19:33 localhost kernel: FS:  00007fd68fcb9d40(0000)
GS:ffff949dffd00000(0000) knlGS:0000000000000000
Feb  7 17:19:33 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Feb  7 17:19:33 localhost kernel: CR2: 0000000000000042 CR3:
00000001dc1be005 CR4: 00000000001706e0
Feb  7 17:19:33 localhost kernel: Call Trace:
Feb  7 17:19:33 localhost kernel: <TASK>
Feb  7 17:19:33 localhost kernel: filemap_map_pages+0x9f/0x7b0
Feb  7 17:19:33 localhost kernel: xfs_filemap_map_pages+0x41/0x60 [xfs]
Feb  7 17:19:33 localhost kernel: do_fault+0x1bf/0x430
Feb  7 17:19:33 localhost kernel: __handle_mm_fault+0x63d/0xe40
Feb  7 17:19:33 localhost kernel: ? do_sigaction+0x11a/0x240
Feb  7 17:19:33 localhost kernel: handle_mm_fault+0xdb/0x2d0
Feb  7 17:19:33 localhost kernel: do_user_addr_fault+0x1cd/0x690
Feb  7 17:19:33 localhost kernel: exc_page_fault+0x70/0x170
Feb  7 17:19:33 localhost kernel: asm_exc_page_fault+0x22/0x30
Feb  7 17:19:33 localhost kernel: RIP: 0033:0x1666350
Feb  7 17:19:33 localhost kernel: Code: Unable to access opcode bytes
at 0x1666326.
Feb  7 17:19:33 localhost kernel: RSP: 002b:00007ffde7fa86d8 EFLAGS:
00010212
Feb  7 17:19:33 localhost kernel: RAX: 0000000000000000 RBX:
00007ffde7fa8748 RCX: 0000000002ed4468
Feb  7 17:19:33 localhost kernel: RDX: 00006000000c4f50 RSI:
00007ffde7fa8748 RDI: 0000000000000012
Feb  7 17:19:33 localhost kernel: RBP: 0000000000000012 R08:
0000000000000001 R09: 0000000002f46860
Feb  7 17:19:33 localhost kernel: R10: 00007fd69219cac0 R11:
00007fd69224e670 R12: 0000000000000000
Feb  7 17:19:33 localhost kernel: R13: 00006000000c4f50 R14:
0000000002ed4470 R15: 00007fd693be0000
Feb  7 17:19:33 localhost kernel: </TASK>
Feb  7 17:19:33 localhost kernel: Modules linked in: xsk_diag veth tls
xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat
br_netfilter bridge stp llc intel_rapl_msr dell_wmi iTCO_wdt
dell_smbios intel_pmc_bxt iTCO_vendor_support dell_wmi_descriptor
ledtrig_audio sparse_keymap video dcdbas intel_rapl_common sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif
irqbypass rapl intel_cstate intel_uncore ipmi_si ipmi_devintf
ipmi_msghandler nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill
overlay ip_set nf_tables nfnetlink qrtr acpi_power_meter mxm_wmi mei_me
mei lpc_ich auth_rpcgss ip6_tables ip_tables sunrpc zram xfs
crct10dif_pclmul crc32_pclmul nvme crc32c_intel polyval_clmulni
polyval_generic ixgbe ghash_clmulni_intel nvme_core sha512_ssse3
megaraid_sas tg3 mgag200 mdio nvme_common dca wmi scsi_dh_rdac
scsi_dh_emc scsi_dh_alua
Feb  7 17:19:33 localhost kernel: dm_multipath fuse
Feb  7 17:19:33 localhost kernel: CR2: 0000000000000042
Feb  7 17:19:33 localhost kernel: ---[ end trace 0000000000000000 ]---
Feb  7 17:19:33 localhost kernel: RIP:
0010:next_uptodate_page+0x46/0x200
Feb  7 17:19:33 localhost kernel: Code: 0f 84 3f 01 00 00 48 81 ff 06
04 00 00 0f 84 b3 00 00 00 48 81 ff 02 04 00 00 0f 84 37 01 00 00 40 f6
c7 01 0f 85 9c 00 00 00 <48> 8b 07 a8 01 0f 85 91 00 00 00 8b 47 34 85
c0 0f 84 86 00 00 00
Feb  7 17:19:33 localhost kernel: RSP: 0000:ffffa83e4ed67cc8 EFLAGS:
00010246
Feb  7 17:19:33 localhost kernel: RAX: 0000000000000042 RBX:
ffffa83e4ed67e00 RCX: 000000000000146e
Feb  7 17:19:33 localhost kernel: RDX: ffffa83e4ed67d20 RSI:
ffff94a9046316b0 RDI: 0000000000000042
Feb  7 17:19:33 localhost kernel: RBP: ffffa83e4ed67d20 R08:
000000000000146e R09: 0000000000dfd000
Feb  7 17:19:33 localhost kernel: R10: 000000000000145f R11:
ffff94978b85960c R12: ffff94a9046316b0
Feb  7 17:19:33 localhost kernel: R13: 000000000000146e R14:
ffff94a9046316b0 R15: ffff948f8bb1f000
Feb  7 17:19:33 localhost kernel: FS:  00007fd68fcb9d40(0000)
GS:ffff949dffd00000(0000) knlGS:0000000000000000


--
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux