Re: [PATCH v2 13/13] mm/gup: move private gup FOLL_ flags to internal.h

Jason Gunthorpe <jgg@xxxxxxxxxx> · Thu, 26 Jan 2023 11:05:27 -0400

On Thu, Jan 26, 2023 at 03:46:09PM +0100, David Hildenbrand wrote:
> On 26.01.23 15:41, Claudio Imbrenda wrote:
> > On Thu, 26 Jan 2023 08:55:27 -0400
> > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > 
> > > On Thu, Jan 26, 2023 at 01:48:46PM +0100, David Hildenbrand wrote:
> > > > On 24.01.23 21:34, Jason Gunthorpe wrote:
> > > > > Move the flags that should not/are not used outside gup.c and related into
> > > > > mm/internal.h to discourage driver abuse.
> > > > > 
> > > > > To make this more maintainable going forward compact the two FOLL ranges
> > > > > with new bit numbers from 0 to 11 and 16 to 21, using shifts so it is
> > > > > explict.
> > > > > 
> > > > > Switch to an enum so the whole thing is easier to read.
> > > > 
> > > > Using a __bitwise type would be even better, but that requires quite some
> > > > adjustments ...
> > > > 
> > > > The primary leftover for FOLL_GET seems to be follow_page(). IIRC, there is
> > > > only one caller that doesn't pass FOLL_GET (s390). We could either add a new
> > > > function to "probe" that anything is mapped (IIRC that's the use case), or
> > > > simply ref+unref.
> > > 
> > > Is that code even safe as written? I don't really understand how it
> > 
> > yes (surprisingly) it is
> > 
> > > can safely call lock_page() on something it doesn't have a reference
> > > too ?
> > 
> > the code between lock_page and unlock_page will behave "properly" and
> > do nothing or at worst cause a tiny performance issue in the rare case
> > something changes between the follow_page and the page_lock, i.e. if
> > things are done on the wrong page.
> 
> What prevents the page from getting unmapped (MADV_DONTNEED), freed,
> reallocated as a larger folio and the unlock_page() would target the wrong
> bit? I think even while freeing a locked page we might run into trouble ...

Yep. 

The issue is you can't call lock_page() on something you don't have a
ref to.

The worst case would be the memory got unmapped from the VMA and the
entire memory space was hot-unpluged eg it was DAX or something. Now
the page pointer will oops if you call lock_page.

Why not just use the get_locked_pte() exclusively and do -EAGAIN or
-EBUSY if folio_try_lock fails, under the PTL? This already happens
for PageWriteback caes.

Jason