On 25.11.19 18:40, Damian Tometzki wrote: > On Tue, 19. Nov 12:52, David Hildenbrand wrote: >> Our onlining/offlining code is unnecessarily complicated. Only memory >> blocks added during boot can have holes (a range that is not >> IORESOURCE_SYSTEM_RAM). Hotplugged memory never has holes (e.g., see >> add_memory_resource()). All memory blocks that belong to boot memory are >> already online. >> >> Note that boot memory can have holes and the memmap of the holes is marked >> PG_reserved. However, also memory allocated early during boot is >> PG_reserved - basically every page of boot memory that is not given to the >> buddy is PG_reserved. >> >> Therefore, when we stop allowing to offline memory blocks with holes, we >> implicitly no longer have to deal with onlining memory blocks with holes. >> E.g., online_pages() will do a >> walk_system_ram_range(..., online_pages_range), whereby >> online_pages_range() will effectively only free the memory holes not >> falling into a hole to the buddy. The other pages (holes) are kept >> PG_reserved (via move_pfn_range_to_zone()->memmap_init_zone()). >> >> This allows to simplify the code. For example, we no longer have to >> worry about marking pages that fall into memory holes PG_reserved when >> onlining memory. We can stop setting pages PG_reserved completely in >> memmap_init_zone(). >> >> Offlining memory blocks added during boot is usually not guaranteed to work >> either way (unmovable data might have easily ended up on that memory during >> boot). So stopping to do that should not really hurt. Also, people are not >> even aware of a setup where onlining/offlining of memory blocks with >> holes used to work reliably (see [1] and [2] especially regarding the >> hotplug path) - I doubt it worked reliably. >> >> For the use case of offlining memory to unplug DIMMs, we should see no >> change. (holes on DIMMs would be weird). >> >> Please note that hardware errors (PG_hwpoison) are not memory holes and >> are not affected by this change when offlining. >> >> [1] https://lkml.org/lkml/2019/10/22/135 >> [2] https://lkml.org/lkml/2019/8/14/1365 >> >> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxx> >> Cc: Oscar Salvador <osalvador@xxxxxxx> >> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> >> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> >> Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx> >> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >> --- >> >> This patch was part of: >> [PATCH v1 00/10] mm: Don't mark hotplugged pages PG_reserved >> (including ZONE_DEVICE) >> -> https://www.spinics.net/lists/linux-driver-devel/msg130042.html >> >> However, before we can perform the PG_reserved changes, we have to fix >> pfn_to_online_page() in special scenarios first (bootmem and devmem falling >> into a single section). Dan is working on that. >> >> I propose to give this patch a churn in -next so we can identify if this >> change would break any existing setup. I will then follow up with cleanups >> and the PG_reserved changes later. >> >> --- >> mm/memory_hotplug.c | 28 ++++++++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 46b2e056a43f..fc617ad6f035 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1455,10 +1455,19 @@ static void node_states_clear_node(int node, struct memory_notify *arg) >> node_clear_state(node, N_MEMORY); >> } >> >> +static int count_system_ram_pages_cb(unsigned long start_pfn, >> + unsigned long nr_pages, void *data) >> +{ >> + unsigned long *nr_system_ram_pages = data; >> + >> + *nr_system_ram_pages += nr_pages; >> + return 0; >> +} >> + > Hello David, > > what is the meaning of the function ? The return is every time 0. > > I dont understand it ? > Hi Damian, please see how these callbacks are used within walk_system_ram_range(). A return value of 0 only means "don't stop iterating", the actual values are returned via the "void *data" parameter in this instance. Cheers! -- Thanks, David / dhildenb