On Thu, 26 Jan 2023 11:05:27 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Thu, Jan 26, 2023 at 03:46:09PM +0100, David Hildenbrand wrote: > > On 26.01.23 15:41, Claudio Imbrenda wrote: > > > On Thu, 26 Jan 2023 08:55:27 -0400 > > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > > > On Thu, Jan 26, 2023 at 01:48:46PM +0100, David Hildenbrand wrote: > > > > > On 24.01.23 21:34, Jason Gunthorpe wrote: > > > > > > Move the flags that should not/are not used outside gup.c and related into > > > > > > mm/internal.h to discourage driver abuse. > > > > > > > > > > > > To make this more maintainable going forward compact the two FOLL ranges > > > > > > with new bit numbers from 0 to 11 and 16 to 21, using shifts so it is > > > > > > explict. > > > > > > > > > > > > Switch to an enum so the whole thing is easier to read. > > > > > > > > > > Using a __bitwise type would be even better, but that requires quite some > > > > > adjustments ... > > > > > > > > > > The primary leftover for FOLL_GET seems to be follow_page(). IIRC, there is > > > > > only one caller that doesn't pass FOLL_GET (s390). We could either add a new > > > > > function to "probe" that anything is mapped (IIRC that's the use case), or > > > > > simply ref+unref. > > > > > > > > Is that code even safe as written? I don't really understand how it > > > > > > yes (surprisingly) it is > > > > > > > can safely call lock_page() on something it doesn't have a reference > > > > too ? > > > > > > the code between lock_page and unlock_page will behave "properly" and > > > do nothing or at worst cause a tiny performance issue in the rare case > > > something changes between the follow_page and the page_lock, i.e. if > > > things are done on the wrong page. > > > > What prevents the page from getting unmapped (MADV_DONTNEED), freed, > > reallocated as a larger folio and the unlock_page() would target the wrong > > bit? I think even while freeing a locked page we might run into trouble ... > > Yep. > > The issue is you can't call lock_page() on something you don't have a > ref to. so we have been doing this wrong the whole time? oops > > The worst case would be the memory got unmapped from the VMA and the > entire memory space was hot-unpluged eg it was DAX or something. Now > the page pointer will oops if you call lock_page. we do not have memory mapped devices or anything, so this scenario is highly unlikely (at last this) > > Why not just use the get_locked_pte() exclusively and do -EAGAIN or > -EBUSY if folio_try_lock fails, under the PTL? This already happens > for PageWriteback caes. I think I will need some time to process this sentence I can tell you that the original goal of that function is to make sure that there are no extra references. in particular, we want to prevent I/O of any kind to be ongoing while the page becomes secure. (the I/O will fail and, depending on which device it was, the whole system might end up in a rather unhappy state) transitioning from secure to non-secure instead is much easier > > Jason