On 21.04.20 14:52, Michal Hocko wrote: > On Tue 21-04-20 14:35:12, David Hildenbrand wrote: >> On 21.04.20 14:30, Michal Hocko wrote: >>> Sorry for the late reply >>> >>> On Thu 16-04-20 12:47:06, David Hildenbrand wrote: >>>> A hotadded node/pgdat will span no pages at all, until memory is moved to >>>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g., >>>> when onlining memory blocks. We don't have to initialize the >>>> node_start_pfn to the memory we are adding. >>> >>> You are right that the node is empty at this phase but that is already >>> reflected by zero present pages (hmm, I do not see spanned pages to be >>> set 0 though). What I am missing here is why this is an improvement. The >>> new node is already visible here and I do not see why we hide the >>> information we already know. >> >> "information we already know" - no, not before we online the memory. > > Is this really the case? All add_memory_resource users operate on a > physical memory range. Having the first add_memory() to magically set node_start_pfn of a hotplugged node isn't dangerous, I think we agree on that. It's just completely unnecessary here and at least left me confused why this is needed at all- because the node start/end pfn is only really touched when onlining/offlining memory (when resizing the zone and the pgdat). > >> Before onlining, it's just setting node_start_pfn to *some value* to be >> overwritten in move_pfn_range_to_zone()->resize_pgdat_range(). > > Yes the value is overwritten but I am not sure this is actually correct > thing to do. I cannot remember why I've chosen to do that. It doesn't > really seem unlikely to online node in a higher physical address. > Well, we decided to glue the node span to onlining/offlining of memory. So, the value really has no meaning without any of that memory being online/the node span being 0. > Btw. one thing that I have in my notes, I was never able to actually > test the no numa node case. Because I have always been testing with node > being allocated during the boot. Do you have any way to trigger this > path? Sure, here is my test case #! /bin/bash sudo qemu-system-x86_64 \ --enable-kvm \ -m 4G,maxmem=20G,slots=2 \ -smp sockets=2,cores=2 \ -numa node,nodeid=0,cpus=0-1,mem=4G -numa node,nodeid=1,mem=0G \ -kernel /home/dhildenb/git/linux/arch/x86_64/boot/bzImage \ -append "console=ttyS0 rd.shell rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 page_owner=on" \ -initrd /boot/initramfs-5.4.7-200.fc31.x86_64.img \ -machine pc \ -nographic \ -nodefaults \ -chardev stdio,id=serial \ -device isa-serial,chardev=serial \ -chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \ -mon chardev=monitor,mode=readline \ -device virtio-balloon \ -object memory-backend-ram,id=mem0,size=512M \ -object memory-backend-ram,id=mem1,size=512M \ -device pc-dimm,id=dimm0,memdev=mem0,node=1 \ -device pc-dimm,id=dimm1,memdev=mem1,node=1 Instead of coldplugging the DIMMs to node 1, you could also hotplug them later (let me know if you need information on how to do that). I use this test to verify that the node is properly onlined/offlined once I unplug/replug the two DIMMs (e.g., after onlining/offlining the memory blocks). -- Thanks, David / dhildenb