David Hildenbrand <david@xxxxxxxxxx> writes: > Working on virtio-mem, I was able to trigger a kernel BUG (with debug > options enabled) when removing memory that was never onlined. I was able > to reproduce with DIMMs. As far as I can see the same can also happen > without debug configs enabled, if we're unlucky and the uninitialized > memmap contains selected garbage . > > The root problem is that we should not try to derive the zone of memory we > are removing from the first PFN. The individual memory blocks of a DIMM > could be spanned by different ZONEs, multiple ZONES (after being offline and > re-onlined) or no ZONE at all (never onlined). > > Let's process all applicable zones when removing memory so we're on the > safe side. In the long term, we want to resize the zones when offlining > memory (and before removing ZONE_DEVICE memory), however, that will require > more thought (and most probably a new SECTION_ACTIVE / pfn_active() > thingy). More details about that in patch #3. > > Along with the fix, some related cleanups. > > v1 -> v2: > - Include "mm: Introduce for_each_zone_nid()" > - "mm/memory_hotplug: Pass nid instead of zone to __remove_pages()" > -- Pass the nid instead of the zone and use it to reduce the number of > zones to process > > --- snip --- > > I gave this a quick test with a DIMM on x86-64: > > Start with a NUMA-less node 1. Hotplug a DIMM (512MB) to Node 1. > 1st memory block is not onlined. 2nd and 4th is onlined MOVABLE. > 3rd is onlined NORMAL. > > :/# echo "online_movable" > /sys/devices/system/memory/memory41/state > [...] > :/# echo "online_movable" > /sys/devices/system/memory/memory43/state > :/# echo "online_kernel" > /sys/devices/system/memory/memory42/state > :/# cat /sys/devices/system/memory/memory40/state > offline > > :/# cat /proc/zoneinfo > Node 1, zone Normal > [...] > spanned 32768 > present 32768 > managed 32768 > [...] > Node 1, zone Movable > [...] > spanned 98304 > present 65536 > managed 65536 > [...] > > Trigger hotunplug. If it succeeds (block 42 can be offlined): > > :/# cat /proc/zoneinfo > > Node 1, zone Normal > pages free 0 > min 0 > low 0 > high 0 > spanned 0 > present 0 > managed 0 > protection: (0, 0, 0, 0, 0) > Node 1, zone Movable > pages free 0 > min 0 > low 0 > high 0 > spanned 0 > present 0 > managed 0 > protection: (0, 0, 0, 0, 0) > > So all zones were properly fixed up and we don't access the memmap of the > first, never-onlined memory block (garbage). I am no longer able to trigger > the BUG. I did a similar test with an already populated node. > I did report a variant of the issue at https://lore.kernel.org/linux-mm/20190514025354.9108-1-aneesh.kumar@xxxxxxxxxxxxx/ This patch series still doesn't handle the fact that struct page backing the start_pfn might not be initialized. ie, it results in crash like below pc: c0000000004bc1ec: shrink_zone_span+0x1bc/0x290 lr: c0000000004bc1e8: shrink_zone_span+0x1b8/0x290 sp: c0000000dac7f910 msr: 800000000282b033 current = 0xc0000000da2fa000 paca = 0xc00000000fffb300 irqmask: 0x03 irq_happened: 0x01 pid = 1224, comm = ndctl kernel BUG at /home/kvaneesh/src/linux/include/linux/mm.h:1088! Linux version 5.3.0-rc6-17495-gc7727d815970-dirty (kvaneesh@ltc-boston123) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #183 SMP Mon Aug 26 09:37:32 CDT 2019 enter ? for help [c0000000dac7f980] c0000000004bc574 __remove_zone+0x84/0xd0 [c0000000dac7f9d0] c0000000004bc920 __remove_section+0x100/0x170 [c0000000dac7fa30] c0000000004bec98 __remove_pages+0x168/0x220 [c0000000dac7fa90] c00000000007dff8 arch_remove_memory+0x38/0x110 [c0000000dac7fb00] c00000000050cb0c devm_memremap_pages_release+0x24c/0x2f0 [c0000000dac7fb90] c000000000cfec00 devm_action_release+0x30/0x50 [c0000000dac7fbb0] c000000000cffe7c release_nodes+0x24c/0x2c0 [c0000000dac7fc20] c000000000cf8988 device_release_driver_internal+0x168/0x230 [c0000000dac7fc60] c000000000cf5624 unbind_store+0x74/0x190 [c0000000dac7fcb0] c000000000cf42a4 drv_attr_store+0x44/0x60 [c0000000dac7fcd0] c000000000617d44 sysfs_kf_write+0x74/0xa0 I do have a few patches to handle the crashes eralier in devm_memremap_pages_release() --- a/mm/memremap.c +++ b/mm/memremap.c @@ -121,7 +121,7 @@ static void devm_memremap_pages_release(void *data) dev_pagemap_cleanup(pgmap); /* pages are dead and unused, undo the arch mapping */ - nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start))); + nid = page_to_nid(pfn_to_page(pfn_first(pgmap))); and also for pfn_first https://www.mail-archive.com/linux-nvdimm@xxxxxxxxxxxx/msg16205.html -aneesh