On Wed, Jun 02, 2021 at 09:45:58PM +0200, Oscar Salvador wrote: > It was too nice and easy to be true I guess. > Yeah, I missed the point that blocking might be an issue here, and hotplug > operations can take really long, so not an option. > I must have switched my brain off back there, because now it is just too > obvious. > > Then I guwmess that seqlock must stay and the only thing than can go is the > pgdat resize lock from the hotplug code. So, I have been looking into this again. Of course, the approach taken here is outrageously wrong, but there are some other things that are a bit confusing. As pointed out, bad_range() (the function that ends up calling page_outside_zone_boundaries) is called from different functions via VM_BUG_ON_*. page_outside_zone_boundaries() takes care of taking the seqlock to avoid reading stale values that might happen if we race with memhotplug operations. page_outside_zone_boundaries() calls zone_spans_pfn() to check that. Now on the funny thing. We do have several places happily calling zone_spans_pfn() without holding zone's seqlock, e.g: set_pageblock_migratetype set_pfnblock_flags_mask zone_spans_pfn move_freepages_block zone_spans_pfn alloc_contig_pages zone_spans_last_pfn zone_spans_pfn Those places hold zone->lock, while move_pfn_range_to_zone() and remove_pfn_range_from_zone() hold zone->seqlock, so AFAICS, those places could read a stale value and proceed thinking the range is within the zone while it is not. So I guess my question is, should we force those places to take the seqlock reader as we do in page_outside_zone_boundaries(), (or maybe just move the seqlock handling to zone_spans_pfn())? Because I does not make much sense to take it in a VM_DEBUG context and not in "real life". Thoughts? -- Oscar Salvador SUSE L3