Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY

Mina Almasry <almasrymina@xxxxxxxxxx> · Wed, 12 May 2021 14:52:54 -0700

On Wed, May 12, 2021 at 2:31 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 5/12/21 1:14 PM, Peter Xu wrote:
> > On Wed, May 12, 2021 at 12:42:32PM -0700, Mina Almasry wrote:
> >>>>> @@ -4868,30 +4869,39 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >>>>> +       WARN_ON(*pagep);
> >>>>
> >>>> I don't think this warning works, because we do set *pagep, in the
> >>>> copy_huge_page_from_user failure case. In that case, the following
> >>>> happens:
> >>>>
> >>>> 1. We set *pagep, and return immediately.
> >>>> 2. Our caller notices this particular error, drops mmap_lock, and then
> >>>> calls us again with *pagep set.
> >>>>
> >>>> In this path, we're supposed to just re-use this existing *pagep
> >>>> instead of allocating a second new page.
> >>>>
> >>>> I think this also means we need to keep the "else" case where *pagep
> >>>> is set below.
> >>>>
> >>>
> >>> +1 to Peter's comment.
> >>>
>
> Apologies to Axel (and Peter) as that comment was from Axel.
>
> >>
> >> Gah, sorry about that. I'll fix in v2.
> >
> > I have a question regarding v1: how do you guarantee huge_add_to_page_cache()
> > won't fail again even if checked before page alloc?  Say, what if the page
> > cache got inserted after hugetlbfs_pagecache_present() (which is newly added in
> > your v1) but before huge_add_to_page_cache()?
>
> In the caller (__mcopy_atomic_hugetlb) we obtain the hugetlb fault mutex
> before calling this routine.  This should prevent changes to the cache
> while in the routine.
>
> However, things get complicated in the case where copy_huge_page_from_user
> fails.  In this case, we will return to the caller which will drop mmap_lock
> and the hugetlb fault mutex before doing the copy.  After dropping the
> mutex, someone could populate the cache.  This would result in the same
> situation where two reserves are 'temporarily' consumed for the same
> mapping offset.  By the time we get to the second call to
> hugetlb_mcopy_atomic_pte where the previously allocated page is passed
> in, it is too late.
>

Thanks. I tried locally to allocate a page, then add it into the
cache, *then* copy its contents (dropping that lock if that fails).
That also has the test passing, but I'm not sure if I'm causing a fire
somewhere else by having a page in the cache that has uninitialized
contents. The only other code that checks the cache seems to be the
hugetlb_fault/hugetlb_cow code. I'm reading that code to try to
understand if I'm breaking that code doing this.

> --
> Mike Kravetz