On Fri, Nov 17, 2023 at 04:37:29PM +0100, David Hildenbrand wrote: > > It might make sense to > > 1) Send the first 3 out separately Ok sure, I will first send 3 patches as bug fixes with your feedback applied. > 2) Look into a simple variant that leaves __add_pages() calls alone and > only adds the new MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers -- > well, and deals with an inaccessible altmap, like the > page_init_poison() when the altmap might be inaccessible. Thanks for the valuable feedback. I just tried out quickly with disabling page_init_poison() and removing the hack in arch_add_memory() and arch_remove_memory(). Also used new MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. The current testing result looks promising and seems to work and no issues found so far. I will also double check if there are any other memmap accesses in add_pages() phase. we will try to go for this approach currently, i.e. with the notifiers you suggested, and __add_pages() change. Do you have any suggestions with how we could check for inaccessible altmap? > 3) Look into a proper interface to add/remove memory instead of relying > on online/offline. agree for long term. > > 2) is certainly an improvement and might be desired in some cases. 3) is > more powerful (e.g., where you don't want an altmap because of > fragmentation) and future proof. > > I suspect there will be installations where an altmap is undesired: it > fragments your address space with unmovable (memmap) allocations. Currently, > runtime allocations of gigantic pages are affected. Long-term other large > allocations (if we ever see very large THP) will be affected. > > For that reason, we want to either support variable-sized memory blocks > long-term, or simulate that by "grouping" memory blocks that share a same > altmap located on the first memory blocks in that group: but onlining one > block forces onlining of the whole group. > > On s390x that adds all memory ahead of time, it's hard to make a decision > what the right granularity will be, and seeing sudden online/offline changed > behavior might be quite "surprising" for users. The user can give better > hints when adding/removing memory explicitly. Thanks for providing insights and details.