On Tue, Aug 23, 2022 at 05:51:25PM +0200, David Hildenbrand wrote: > >> The race is simple -- page allocation could be in progress when a memory > >> hot-remove operation triggers a zonelist rebuild that removes zones. > >> The allocation request will still have a valid ac->preferred_zoneref that > >> is now pointing to NULL and triggers an OOM kill. > >> > >> This problem probably always existed but may be slighly easier to trigger > > s/slighly/slightly/ > Fixed. > >> due to 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones > >> with pages managed by the buddy allocator") which distinguishes between > >> zones that are completely unpopulated versus zones that have valid pages > >> but they are all reserved. Memory hotplug had multiple stages with > > Not necessarily reserved, simply not managed by the buddy (e.g., early > allocations, memory ballooning / virtio-mem). > Fair point, I filed all that under "reserved" but you're right, this is clearer. > >> +/* > >> + * Zonelists may change due to hotplug during allocation. Detect when zonelists > >> + * have been rebuilt so allocation retries. Reader side does not lock and > >> + * retries the allocation if zonelist changes. Writer side is protected by the > >> + * embedded spin_lock. > >> + */ > >> +DEFINE_SEQLOCK(zonelist_update_seq); > >> + > >> +static unsigned int zonelist_iter_begin(void) > >> +{ > > > > You likely want something like > > if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) > > return read_seqbegin(&zonelist_update_seq); > > return 0; > > > >> + return read_seqbegin(&zonelist_update_seq); > >> +} > >> + > >> +static unsigned int check_retry_zonelist(unsigned int seq) > >> +{ > > if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) > > return read_seqretry(&zonelist_update_seq, seq); > > return seq; > > > >> + return read_seqretry(&zonelist_update_seq, seq); > >> +} > >> + > > > > to avoid overhead on systems without HOTREMOVE configured. > > > > Other than that LGTM. > > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > It's a minor saving but fair enough. HOTREMOVE is now the only reason zonelists can be rebuilt. While that was not always true, if it ever changes again, it's a simple fix. Thanks Michal. > Makes sense to me, although I wonder how much it will matter in practice. > Probably none at all as it's one branch but it's still valid. > Reviewed-by: David Hildenbrand <david@xxxxxxxxxx> > Thanks David. -- Mel Gorman SUSE Labs