Re: [syzbot] [mm?] kernel BUG in vma_replace_policy

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 14 Sep 2023 20:09:03 +0100




On Thu, Sep 14, 2023 at 06:20:56PM +0000, Suren Baghdasaryan wrote:
> I think I found the problem and the explanation is much simpler. While
> walking the page range, queue_folios_pte_range() encounters an
> unmovable page and queue_folios_pte_range() returns 1. That causes a
> break from the loop inside walk_page_range() and no more VMAs get
> locked. After that the loop calling mbind_range() walks over all VMAs,
> even the ones which were skipped by queue_folios_pte_range() and that
> causes this BUG assertion.
> 
> Thinking what's the right way to handle this situation (what's the
> expected behavior here)...
> I think the safest way would be to modify walk_page_range() and make
> it continue calling process_vma_walk_lock() for all VMAs in the range
> even when __walk_page_range() returns a positive err. Any objection or
> alternative suggestions?

So we only return 1 here if MPOL_MF_MOVE* & MPOL_MF_STRICT were
specified.  That means we're going to return an error, no matter what,
and there's no point in calling mbind_range().  Right?

+++ b/mm/mempolicy.c
@@ -1334,6 +1334,8 @@ static long do_mbind(unsigned long start, unsigned long len,
        ret = queue_pages_range(mm, start, end, nmask,
                          flags | MPOL_MF_INVERT, &pagelist, true);

+       if (ret == 1)
+               ret = -EIO;
        if (ret < 0) {
                err = ret;
                goto up_out;

(I don't really understand this code, so it can't be this simple, can
it?  Why don't we just return -EIO from queue_folios_pte_range() if
this is the right answer?)