Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values

Oscar Salvador <osalvador@xxxxxxx> · Thu, 3 Jun 2021 10:38:24 +0200

On Wed, Jun 02, 2021 at 09:45:58PM +0200, Oscar Salvador wrote:
> It was too nice and easy to be true I guess.
> Yeah, I missed the point that blocking might be an issue here, and hotplug
> operations can take really long, so not an option.
> I must have switched my brain off back there, because now it is just too
> obvious.
> 
> Then I guwmess that seqlock must stay and the only thing than can go is the
> pgdat resize lock from the hotplug code.

So, I have been looking into this again.
Of course, the approach taken here is outrageously wrong, but there are
some other things that are a bit confusing.

As pointed out, bad_range() (the function that ends up calling
page_outside_zone_boundaries) is called from different functions via VM_BUG_ON_*.
page_outside_zone_boundaries() takes care of taking the seqlock to avoid
reading stale values that might happen if we race with memhotplug
operations.
page_outside_zone_boundaries() calls zone_spans_pfn() to check that.

Now on the funny thing.

We do have several places happily calling zone_spans_pfn() without
holding zone's seqlock, e.g:

set_pageblock_migratetype
 set_pfnblock_flags_mask
  zone_spans_pfn

move_freepages_block
 zone_spans_pfn

alloc_contig_pages
 zone_spans_last_pfn
  zone_spans_pfn

Those places hold zone->lock, while move_pfn_range_to_zone() and
remove_pfn_range_from_zone() hold zone->seqlock, so AFAICS, those places
could read a stale value and proceed thinking the range is within the
zone while it is not.

So I guess my question is, should we force those places to take the
seqlock reader as we do in page_outside_zone_boundaries(), (or maybe
just move the seqlock handling to zone_spans_pfn())?

Because I does not make much sense to take it in a VM_DEBUG context and
not in "real life".

Thoughts?

-- 
Oscar Salvador
SUSE L3