On 2/16/21 1:34 PM, Vlastimil Babka wrote: > On 2/16/21 12:01 PM, Mike Rapoport wrote: >>> >>> I do understand that. And I am not objecting to the patch. I have to >>> confess I haven't digested it yet. Any changes to early memory >>> intialization have turned out to be subtle and corner cases only pop up >>> later. This is almost impossible to review just by reading the code. >>> That's why I am asking whether we want to address the specific VM_BUG_ON >>> first with something much less tricky and actually reviewable. And >>> that's why I am asking whether dropping the bug_on itself is safe to do >>> and use as a hot fix which should be easier to backport. >> >> I can't say I'm familiar enough with migration and compaction code to say >> if it's ok to remove that bug_on. It does point to inconsistency in the >> memmap, but probably it's not important. > > On closer look, removing the VM_BUG_ON_PAGE() in set_pfnblock_flags_mask() is > not safe. If we violate the zone_spans_pfn condition, it means we will write > outside of the pageblock bitmap for the zone, and corrupt something. Actually Clarification. This is true only for !CONFIG_SPARSEMEM, which is unlikely in practice to produce the configurations that trigger this issue. So we can remove the VM_BUG_ON_PAGE() > similar thing can happen in __get_pfnblock_flags_mask() where there's no > VM_BUG_ON, but there we can't corrupt memory. But we could theoretically fault > to do accessing some unmapped range? > > So the checks would have to become unconditional !DEBUG_VM and return instead of > causing a BUG. Or we could go back one level and add some checks to > fast_isolate_around() to detect a page from zone that doesn't match cc->zone. > The question is if there is another code that will break if a page_zone() > suddenly changes e.g. in the middle of the pageblock - __pageblock_pfn_to_page() > assumes that if first and last page is from the same zone, so are all pages in > between, and the rest relies on that. But maybe if Andrea's > fast_isolate_around() issue is fixed, that's enough for stable backport. > > > >