Re: Warning on mremapped uffd-wp memory

Peter Xu <peterx@xxxxxxxxxx> · Tue, 1 Oct 2024 11:42:07 -0400

On Tue, Oct 01, 2024 at 04:31:31PM +0100, Ryan Roberts wrote:
> On 01/10/2024 16:10, Peter Xu wrote:
> > On Tue, Oct 01, 2024 at 03:27:48PM +0100, Ryan Roberts wrote:
> >> Hi Peter,
> >>
> >> On 08/08/2024 12:25, Ryan Roberts wrote:
> >>> On 07/08/2024 19:59, Peter Xu wrote:
> >>>> On Wed, Aug 07, 2024 at 12:18:18PM +0200, David Hildenbrand wrote:
> >>>>> On 07.08.24 10:58, David Hildenbrand wrote:
> >>>>>> On 06.08.24 22:29, Peter Xu wrote:
> >>>>>>> On Tue, Aug 06, 2024 at 06:37:55PM +0200, David Hildenbrand wrote:
> >>>>>>>> On 06.08.24 17:15, Ryan Roberts wrote:
> >>>>>>>>> Hi Peter, David,
> >>>>>>>
> >>>>>>> Hi, Ryan,
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> syzkaller has found an issue (at least on arm64, but I suspect it will be
> >>>>>>>>> visible on x86_64 too) that triggers the following warning:
> >>>>>>>
> >>>>>>> This is true.  I can easily reproduce..
> >>>>>>>
> >>
> >> [...]
> >>
> >>>> When I'm looking at this specific issue again, it's more than ptes that
> >>>> should need to remove the uffd-wp bit.  We have:
> >>>>
> >>>>   - pmd/pud/hugetlb in other paths that will need similar care..
> >>>>
> >>>>   - move_page_tables() smartness on HAVE_MOVE_PUD.. where we may need to
> >>>>     walk the pmd page removing the bits when necessary..
> >>>>
> >>>>   - more importantly, mremap_userfaultfd_prep() might be too late if it's
> >>>>     after moving pgtables..
> >>>>
> >>>>   - [not yet started looking] the mlock issue Ryan mentioned..
> >>>>
> >>>> Looks like we'll need more things to fix and test..
> >>>>
> >>>> I wished if I can simply disable UFFD_WP + EVENT_REMAP, but I think even
> >>>> with that, by default when mremap() we should still logically tear down all
> >>>> those uffd-wp bits which is the same as !EVENT_REMAP now..
> >>>>
> >>>> Let me know if anyone would like to beat me to it on fixing the whole
> >>>> thing, I'd be more than happy..  
> >>>
> >>> Afraid I won't be able to sign up to doing that work.
> >>>
> >>> Otherwise, I'll probably need to postpone
> >>>> the fix of this issue for 1-2 weeks but finish some other things first..
> >>
> >> I'm not sure if there was any progress on this? We are still seeing the problem
> >> on v6.12-rc1.
> > 
> > Hi, Ryan,
> > 
> > I haven't yet got free time to look at this, sorry.  I confess I didn't
> > prioritize this as high, as I doubt anyone would make real use of it, or
> > hit this issue in real workloads, and it'll even slow down generic
> > workloads even if slightly.
> 
> No problem, I'm acting as the middle man really, given -rc1 is out, Mark has
> been running his usual fuzzing and noted that the issue still exists. So I
> thought I'd just enquire to see if you were able to make any progress. I agree
> its not high priority. Although for a panic_on_warn=1 kernel (which I understand
> some use in deployment), this means that user space can panic the system, so I
> guess it needs to be addressed eventually.
> 
> > 
> > Do you want to have a look?  It'll be great if so.  Or I can try to find
> > some time this month.
> 
> I won't personally get time to look at this, since I'm busy with some other
> commitments. But I might be able to find someone to look into it. Leave it with
> me for now.

Thank you!

If there's patches I can definitely try to review them.

Or if this won't get addressed before someone else pokes again, I'll do it.

Thanks,

-- 
Peter Xu