Swap was on. Swap is small (127MB). Swap had not been dipped into.On Thu, Jan 18, 2018 at 02:18:20PM -0800, Laura Abbott wrote:On 01/18/2018 01:55 PM, Andrew Morton wrote:[ 24.647744] BUG: unable to handle kernel NULL pointer dereference at 00000008 [ 24.647801] IP: __radix_tree_lookup+0x14/0xa0 [ 24.647811] *pdpt = 00000000253d6027 *pde = 0000000000000000 [ 24.647828] Oops: 0000 [#1] SMP [ 24.647842] CPU: 5 PID: 3600 Comm: java Not tainted 4.14.13-rh10-20180115190010.xenU.i386 #1 [ 24.647855] task: e52518c0 task.stack: e4e7a000 [ 24.647866] EIP: __radix_tree_lookup+0x14/0xa0 [ 24.647876] EFLAGS: 00010286 CPU: 5 [ 24.647884] EAX: 00000004 EBX: 00000007 ECX: 00000000 EDX: 00000000 [ 24.647895] ESI: 00000000 EDI: 00000000 EBP: e4e7bdb8 ESP: e4e7bda0 [ 24.647904] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [ 24.647917] CR0: 80050033 CR2: 00000008 CR3: 25360000 CR4: 00002660 [ 24.647930] Call Trace: [ 24.647942] radix_tree_lookup_slot+0x13/0x30 [ 24.647955] find_get_entry+0x1d/0x120 [ 24.647963] pagecache_get_page+0x1f/0x230 [ 24.647975] lookup_swap_cache+0x42/0x140 [ 24.647983] swap_readahead_detect+0x66/0x2e0 [ 24.647993] do_swap_page+0x1fa/0x860 [ 24.648010] ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10 [ 24.648026] ? xen_pmd_val+0x10/0x20 [ 24.648035] handle_mm_fault+0x6f8/0x1020 [ 24.648046] __do_page_fault+0x18a/0x450 [ 24.648055] ? vmalloc_sync_all+0x250/0x250 [ 24.648063] do_page_fault+0x21/0x30 [ 24.648074] common_exception+0x45/0x4a [ 24.648082] EIP: 0xb76d873e [ 24.648088] EFLAGS: 00010206 CPU: 5 [ 24.648096] EAX: 76a10000 EBX: 76a1cd14 ECX: 00000006 EDX: 00000006 [ 24.648105] ESI: 00000040 EDI: b796c380 EBP: 77881008 ESP: 77880ff8 [ 24.648115] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b [ 24.648124] Code: ff ff ff 00 47 03 e9 69 ff ff ff 8b 45 08 89 06 e9 1f ff ff ff 66 90 55 89 e5 57 89 d7 56 53 83 ec 0c 89 45 ec 89 4d e8 8b 45 ec <8b> 58 04 89 d8 83 e0 03 48 89 5d f0 75 64 89 d8 83 e0 fe 0f b6 [ 24.648195] EIP: __radix_tree_lookup+0x14/0xa0 SS:ESP: 0069:e4e7bda0 [ 24.648205] CR2: 0000000000000008 [ 24.648273] ---[ end trace ed356e59f215ce07 ]---Running that code through decodecode, I get: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 57 push %edi 4: 89 d7 mov %edx,%edi 6: 56 push %esi 7: 53 push %ebx 8: 83 ec 0c sub $0xc,%esp b: 89 45 ec mov %eax,-0x14(%ebp) e: 89 4d e8 mov %ecx,-0x18(%ebp) 11: 8b 45 ec mov -0x14(%ebp),%eax 14:* 8b 58 04 mov 0x4(%eax),%ebx <-- trapping instruction 17: 89 d8 mov %ebx,%eax 19: 83 e0 03 and $0x3,%eax Which I think means it's looking at offset 4 from whichever argument the x86 calling convention puts in register %eax. Which I think is argument 0? Which is the radix tree root. And that makes sense; we're loading the root node from the radix tree root at offset 4. The problem is that %eax has the value 4 in it. That would match with 'page_tree' being at offset 4 from the start of address_space. So find_get_page() got called with a NULL mapping, so pagecache_get_page() got called with a NULL mapping. Which means I've tracked it back to: page = find_get_page(swap_address_space(entry), swp_offset(entry)); and swap_address_space() is returning NULL. Has this machine run swapoff recently, perhaps? total used free shared buffers cached Swap: 127 0 127 PS: cannot recall seeing this issue on x86_64, just 32 bit. PPS: reminder this is on a Xen VM which per https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#PVH-Guest-Specific-Options has "out of sync pagetables" if that is relevant (we do not set that option, I am unsure what default is used). |