On 29.08.19 18:27, Michal Hocko wrote: > On Thu 29-08-19 17:54:35, David Hildenbrand wrote: >> On 29.08.19 17:39, Michal Hocko wrote: >>> On Mon 26-08-19 12:10:09, David Hildenbrand wrote: >>>> It is easier than I though to trigger a kernel bug by removing memory that >>>> was never onlined. With CONFIG_DEBUG_VM the memmap is initialized with >>>> garbage, resulting in the detection of a broken zone when removing memory. >>>> Without CONFIG_DEBUG_VM it is less likely - but we could still have >>>> garbage in the memmap. >>>> >>>> :/# [ 23.912993] BUG: unable to handle page fault for address: 000000000000353d >>>> [ 23.914219] #PF: supervisor write access in kernel mode >>>> [ 23.915199] #PF: error_code(0x0002) - not-present page >>>> [ 23.916160] PGD 0 P4D 0 >>>> [ 23.916627] Oops: 0002 [#1] SMP PTI >>>> [ 23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 >>>> [ 23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 >>>> [ 23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn >>>> [ 23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10 >>>> [ 23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 >>>> [ 23.926876] RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 >>>> [ 23.927928] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 >>>> [ 23.929458] RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 >>>> [ 23.930899] RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 >>>> [ 23.932362] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 >>>> [ 23.933603] R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 >>>> [ 23.934913] FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 >>>> [ 23.936294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 23.937481] CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 >>>> [ 23.938687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> [ 23.939889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> [ 23.941168] Call Trace: >>>> [ 23.941580] __remove_pages+0x4b/0x640 >>>> [ 23.942303] ? mark_held_locks+0x49/0x70 >>>> [ 23.943149] arch_remove_memory+0x63/0x8d >>>> [ 23.943921] try_remove_memory+0xdb/0x130 >>>> [ 23.944766] ? walk_memory_blocks+0x7f/0x9e >>>> [ 23.945616] __remove_memory+0xa/0x11 >>>> [ 23.946274] acpi_memory_device_remove+0x70/0x100 >>>> [ 23.947308] acpi_bus_trim+0x55/0x90 >>>> [ 23.947914] acpi_device_hotplug+0x227/0x3a0 >>>> [ 23.948714] acpi_hotplug_work_fn+0x1a/0x30 >>>> [ 23.949433] process_one_work+0x221/0x550 >>>> [ 23.950190] worker_thread+0x50/0x3b0 >>>> [ 23.950993] kthread+0x105/0x140 >>>> [ 23.951644] ? process_one_work+0x550/0x550 >>>> [ 23.952508] ? kthread_park+0x80/0x80 >>>> [ 23.953367] ret_from_fork+0x3a/0x50 >>>> [ 23.954025] Modules linked in: >>>> [ 23.954613] CR2: 000000000000353d >>>> [ 23.955248] ---[ end trace 93d982b1fb3e1a69 ]--- >>> >>> Yes, this is indeed nasty. I didin't think of this when separating >>> memmap initialization from the hotremove. This means that the zone >>> pointer is a garbage in arch_remove_memory already. The proper fix is to >>> remove it from that level down. Moreover the zone is only needed for the >>> shrinking code and zone continuous thingy. The later belongs to offlining >>> code unless I am missing something. I can see that you are removing zone >>> parameter in a later patch but wouldn't it be just better to remove the >>> whole zone thing in a single patch and have this as a bug fix for a rare >>> bug with a fixes tag? >>> >> >> If I remember correctly, this patch already fixed the issue for me, > > That might be the case because nothing else does access zone on the way. > But the pointer is simply bogus. Removing it is the proper way to fix > it. And I argue that zone shouldn't even be necessary. Re-evaluating > continuous status of the zone is really something for offlining phase. > Check how we use pfn_to_online_page there. Yeah I'm with you, I think you spotted patch 6/6 of this series and v3 already that does exactly that. It's just a matter of rearranging things. > >> without the other cleanup (removing the zone parameter). But I might be >> wrong. >> >> Anyhow, I'll send a v4 shortly (either this evening or tomorrow), so you >> can safe yourself some review time and wait for that one :) > > No rush, really... It seems this is quite unlikely event as most hotplug > usecases simply online memory before removing it later on. > I can trigger it reliably right now while working/testing virtio-mem, so I finally want to clean up this mess :) (has been on my list for a long time). I'll try to hunt for the right commit id's that broke it. -- Thanks, David / dhildenb