On Wed, May 22, 2024 at 05:24:17PM +0200, David Hildenbrand wrote: > On 22.05.24 16:27, Brendan Jackman wrote: > > On Wed, May 22, 2024 at 04:09:41PM +0200, David Hildenbrand wrote: > > By the way, some noob questions: am I OK with my assumption that it's > > fine for reader code to operate on zone spans that are both stale and > > "from the future"? thinking abstractly I guess that seeing a stale > > value when racing with offline_pages is roughly the same as seeing a > > value "from the future" when racing with online_pages? > > Right. PFN walkers should be using pfn_to_online_page(), where races are > possible but barely seen in practice. > > zone handlers like mm/compaction.c can likely deal with races, although it > might all be cleaner (and safer?) when using start+end. I recall it also > recalls on pfn_to_online_page(). > > Regarding page_outside_zone_boundaries(), it should be fine if we can read > start+end atomically, that way we would not accidentally report "page > outside ..." when changing the start address. I think with your current > patch that might happen (although likely extremely hard to trigger) when > growing the zone at the start, reducing zone_start_pfn. Thanks a lot, this is very helpful > > Also, is it ever possible for pages to get removed and then added back > > and end up in a different zone than before? > > Yes. Changing between MOVABLE and NORMAL is possible and can easily be > triggered by offlining+re-onlining memory blocks. So, even if we make it impossible to see a totally bogus zone span, you can observe a stale/futuristic span which currently contains pages from a different zone? That seems to imply you could look up a page page from a PFN within zone A's apparent span, lock zone A and assume you can safely modify the freelist the page is on, but actually that page is now in zone B. So for example: 1. compact_zone() sets cc->free_pfn based on zone_end_pfn 2. isolate_freepages() sets isolate_start_pfn = cc->free_pfn 3. isolate_freepages_block() looks up a page based on that PFN 3. ... then takes the cc->zone lock 4. ... then calls __isolate_free_page which removes the page from whatever freelist it's on. Is anything stopping part 4 from modifying a zone that wasn't locked in part 3?