Re: [PATCH v2 13/13] mm/gup: move private gup FOLL_ flags to internal.h

Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> · Thu, 26 Jan 2023 16:39:02 +0100

On Thu, 26 Jan 2023 11:05:27 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Thu, Jan 26, 2023 at 03:46:09PM +0100, David Hildenbrand wrote:
> > On 26.01.23 15:41, Claudio Imbrenda wrote:  
> > > On Thu, 26 Jan 2023 08:55:27 -0400
> > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > >   
> > > > On Thu, Jan 26, 2023 at 01:48:46PM +0100, David Hildenbrand wrote:  
> > > > > On 24.01.23 21:34, Jason Gunthorpe wrote:  
> > > > > > Move the flags that should not/are not used outside gup.c and related into
> > > > > > mm/internal.h to discourage driver abuse.
> > > > > > 
> > > > > > To make this more maintainable going forward compact the two FOLL ranges
> > > > > > with new bit numbers from 0 to 11 and 16 to 21, using shifts so it is
> > > > > > explict.
> > > > > > 
> > > > > > Switch to an enum so the whole thing is easier to read.  
> > > > > 
> > > > > Using a __bitwise type would be even better, but that requires quite some
> > > > > adjustments ...
> > > > > 
> > > > > The primary leftover for FOLL_GET seems to be follow_page(). IIRC, there is
> > > > > only one caller that doesn't pass FOLL_GET (s390). We could either add a new
> > > > > function to "probe" that anything is mapped (IIRC that's the use case), or
> > > > > simply ref+unref.  
> > > > 
> > > > Is that code even safe as written? I don't really understand how it  
> > > 
> > > yes (surprisingly) it is
> > >   
> > > > can safely call lock_page() on something it doesn't have a reference
> > > > too ?  
> > > 
> > > the code between lock_page and unlock_page will behave "properly" and
> > > do nothing or at worst cause a tiny performance issue in the rare case
> > > something changes between the follow_page and the page_lock, i.e. if
> > > things are done on the wrong page.  
> > 
> > What prevents the page from getting unmapped (MADV_DONTNEED), freed,
> > reallocated as a larger folio and the unlock_page() would target the wrong
> > bit? I think even while freeing a locked page we might run into trouble ...  
> 
> Yep. 
> 
> The issue is you can't call lock_page() on something you don't have a
> ref to.

so we have been doing this wrong the whole time? oops

> 
> The worst case would be the memory got unmapped from the VMA and the
> entire memory space was hot-unpluged eg it was DAX or something. Now
> the page pointer will oops if you call lock_page.

we do not have memory mapped devices or anything, so this scenario is
highly unlikely (at last this)

> 
> Why not just use the get_locked_pte() exclusively and do -EAGAIN or
> -EBUSY if folio_try_lock fails, under the PTL? This already happens
> for PageWriteback caes.

I think I will need some time to process this sentence

I can tell you that the original goal of that function is to make sure
that there are no extra references. in particular, we want to prevent
I/O of any kind to be ongoing while the page becomes secure. (the I/O
will fail and, depending on which device it was, the whole system might
end up in a rather unhappy state)

transitioning from secure to non-secure instead is much easier

> 
> Jason