On Fri 04-09-20 10:25:02, Pavel Tatashin wrote: > > Another alternative would be to enable/disable static branch only from > > users who really care but this is quite tricky because how do you tell > > you need or not? It seems that alloc_contig_range would be just fine > > with a weaker semantic because it would "only" to a spurious failure. > > Memory hotplug on the other hand really needs to have a point where > > nobody interferes with the offlined memory so it could ask for a > > stronger semantic. > > > > Yet another option would be to make draining stronger and actually > > guarantee there are no in-flight pages to be freed to the pcp list. > > One way would be to tweak pcp->high and implement a strong barrier > > (IPI?) to sync with all CPUs. Quite expensive, especially when there are > > many draining requests (read cma users because hotplug doesn't really > > matter much as it happens seldom). > > > > So no nice&cheap solution I can think of... > > I think start_isolate_page_range() should not be doing page draining > at all. It should isolate ranges, meaning set appropriate flags, but > draining should be performed by the users when appropriate: next to > lru_add_drain_all() calls both in CMA and hotplug. I disagree. The pcp draining is an implementation detail and we shouldn't bother callers to be aware of it. > Currently, the way start_isolate_page_range() drains pages is very > inefficient. It calls drain_all_pages() for every valid page block, > which is a slow call as it starts a thread per cpu, and waits for > those threads to finish before returning. This is an implementation detail. > We could optimize by moving the drain_all_pages() calls from > set_migratetype_isolate() to start_isolate_page_range() and call it > once for every different zone, but both current users of this > interface guarantee that all pfns [start_pfn, end_pfn] are within the > same zone, and I think we should keep it this way, so again the extra > traversal is going to be overhead overhead. Again this just leads to tricky code. Just look at how easy it was to break this by removing something that looked clearly a duplicate call. It is true that memory isolation usage is limited to only few usecasaes but I would strongly prefer to make the semantic clear so that we do not repeat this regressions. -- Michal Hocko SUSE Labs