On 12.04.21 17:27, Mel Gorman wrote:
On Mon, Apr 12, 2021 at 04:12:11PM +0200, David Hildenbrand wrote:
After v1 of the patch, the race was reduced to the point between the
zone watermark check and the rmqueue_pcplist but yes, it still existed.
Closing it completely was either complex or expensive. Setting
zone->pageset = &boot_pageset before the free would shrink the race
further but that still leaves a potential memory ordering issue.
While fixable, it's either complex, expensive or both so yes, just leaving
the pageset structures in place would be much more straight-forward
assuming the structures were not allocated in the zone that is being
hot-removed. As things stand, I had trouble even testing zone hot-remove
as there was always a few pages left behind and I did not chase down
why.
Can you elaborate? I can reliably trigger zone present pages going to 0 by
just hotplugging a DIMM, onlining the memory block devices to the MOVABLE
zone, followed by offlining the memory block again.
For the machine I was testing on, I tried offlining all memory within
a zone on a NUMA machine. Even if I used movable_zone to create a zone
or numa=fake to create multiple fake nodes and zones, there was always
either reserved or pinned pages preventing the full zone being removed.
What can happen is that memblock allocations are still placed into the
MOVABLE zone -- even with "movablenode" IIRC.
Memory hot(un)plug is usually best tested in QEMU via pc-dimm devices.
--
Thanks,
David / dhildenb