On Tue, Aug 6, 2024 at 5:37 PM Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > On Tue, Aug 6, 2024 at 4:53 PM Ira Weiny <iweiny@iweiny-mobl> wrote: > > > > On Tue, Aug 06, 2024 at 01:59:54PM -0400, Pasha Tatashin wrote: > > > On Mon, Aug 5, 2024 at 7:06 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > > > > Pasha Tatashin wrote: > > > > [..] > > > > > Thank you for the heads up. Can you please attach a full config file, > > > > > also was anyone able to reproduce this problem in qemu with emulated > > > > > nvdimm? > > > > > > > > Yes, I can reproduce the crash just by trying to reconfigure the mode of > > > > a pmem namespace: > > > > > > > > # ndctl create-namespace -m raw -f -e namespace0.0 > > > > > > > > ...where namespace0.0 results from: > > > > > > > > memmap=4G!4G > > > > > > > > ...passed on the kernel command line. > > > > > > > > Kernel config here: > > > > > > > > https://gist.github.com/djbw/143705077103d43a735c179395d4f69a > > > > > > Excellent, I was able to reproduce this problem. > > > > > > The problem appear to be caused by this code: > > > > > > Calling page_pgdat() in depopulate_section_memmap(): > > > > > > static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, > > > struct vmem_altmap *altmap) > > > { > > > unsigned long start = (unsigned long) pfn_to_page(pfn); > > > unsigned long end = start + nr_pages * sizeof(struct page); > > > > > > mod_node_page_state(page_pgdat(pfn_to_page(pfn)), NR_MEMMAP, > > > <<<< We cannot do it. > > > -1L * (DIV_ROUND_UP(end - start, PAGE_SIZE))); > > > vmemmap_free(start, end, altmap); > > > } > > > > > > The page_pgdat() returns NULL starting from: > > > pageunmap_range() > > > remove_pfn_range_from_zone() <- page is removed from the zone. > > > > Is there any idea on a fix? I'm seeing the same error. > > > > [ 561.867431] ? mod_node_page_state+0x11/0xa0 > > [ 561.867963] section_deactivate+0x2a0/0x2c0 > > [ 561.868496] __remove_pages+0x59/0x90 > > [ 561.868975] arch_remove_memory+0x1a/0x40 > > [ 561.869491] memunmap_pages+0x206/0x3d0 > > [ 561.869972] devres_release_all+0xa8/0xe0 > > [ 561.870466] device_unbind_cleanup+0xe/0x70 > > [ 561.870960] device_release_driver_internal+0x1ca/0x210 > > [ 561.871529] driver_detach+0x47/0x90 > > [ 561.871981] bus_remove_driver+0x6c/0xf0 > > > > Shall we revert this patch until we figure out a fix? > > I am working on a fix, and will send it out in a couple hours. Patch is posted: https://lore.kernel.org/all/20240806221454.1971755-2-pasha.tatashin@xxxxxxxxxx/#r > > Pasha