On 31.01.22 12:29, Oscar Salvador wrote: > On Fri, Jan 28, 2022 at 04:26:20PM +0100, David Hildenbrand wrote: >> For memory hot(un)plug, we only really care about memory blocks that: >> * span a single zone (and, thereby, a single node) >> * are completely System RAM (IOW, no holes, no ZONE_DEVICE) >> If one of these conditions is not met, we reject memory offlining. >> Hotplugged memory blocks (starting out offline), always meet both >> conditions. Thanks for the review Oscar! > > This has been always hard for me to follow, so bear with me. > > I remember we changed the memory-hotplug policy, not long ago, wrt. > what we can online/offline so we could get rid of certain assumptions like > "there are no holes in this memblock, so it can go" etc. Yes, end of 2019 via c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to online/offline memory blocks with holes"). > > AFAIR, we can only offline if the memory > > 1) belongs to a single node ("which is always the case for > hotplugged-memory, boot memory is trickier") > 2) does not have any holes > 3) spans a single zone > > These are the only requeriments we have atm, right? The most prominent core requirements, yes, leaving memory notifiers out of the picture. 3) implies 1) as zones are per-node. > > By default, hotplugged memory already complies with they all three, > only in case of ZONE_DEVICE stuff we might violate 2) and 3). > >> There are three scenarios to handle: > ... > ... > >> @@ -225,6 +226,9 @@ static int memory_block_offline(struct memory_block *mem) >> unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; >> int ret; >> >> + if (!mem->zone) >> + return -EBUSY; > > Should not we return -EINVAL? I mean, -EBUSY reads like this might be a > temporary error which might get fixed later on, but that isn't the case. > >> @@ -234,7 +238,7 @@ static int memory_block_offline(struct memory_block *mem) >> -nr_vmemmap_pages); >> >> ret = offline_pages(start_pfn + nr_vmemmap_pages, >> - nr_pages - nr_vmemmap_pages, mem->group); >> + nr_pages - nr_vmemmap_pages, mem->zone, mem->group); > > Why not passing the node as well? The zone implies the node, and the prototype now matches the one of online_pages(). So if we'd ever want to change that we should do it for both functions, but I don't necessarily see the need for it. > >> +static struct zone *early_node_zone_for_memory_block(struct memory_block *mem, >> + int nid) >> +{ >> + const unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); >> + const unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; >> + struct zone *zone, *matching_zone = NULL; >> + pg_data_t *pgdat = NODE_DATA(nid); > > I was about to complain because in init_memory_block() you call > early_node_zone_for_memory_block() with nid == NUMA_NODE_NODE, but then > I saw that NODE_DATA on !CONFIG_NUMA falls to contig_page_data. > So, I guess we cannot really reach this on CONFIG_NUMA machines with nid > being NUMA_NO_NODE, right? (do we want to add a check just in case?) > Yes, on CONFIG_NUMA this is only called via memory_block_set_nid(). memory_block_set_nid() is only available with CONFIG_NUMA and calling memory_block_set_nid() with NUMA_NO_NODE would be a BUG. (before sending this out I even had a BUG_ON() in memory_block_set_nid() to verify that, but I removed it because BUG_ON's are frowned-upon.) -- Thanks, David / dhildenb