On Fri, Jan 28, 2022 at 09:36:14AM +0800, Peter Xu wrote: > On Thu, Jan 27, 2022 at 11:25:38AM -0400, Jason Gunthorpe wrote: > > On Thu, Jan 27, 2022 at 05:19:56PM +0800, Peter Xu wrote: > > > > > > > > diff --git a/mm/gup.c b/mm/gup.c > > > > > > index f0af462ac1e2..8ebc04058e97 100644 > > > > > > +++ b/mm/gup.c > > > > > > @@ -440,7 +440,7 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, > > > > > > pte_t *pte, unsigned int flags) > > > > > > { > > > > > > /* No page to get reference */ > > > > > > - if (flags & FOLL_GET) > > > > > > + if (flags & (FOLL_GET | FOLL_PIN)) > > > > > > return -EFAULT; > > > > > > > > > > Yes. This clearly fixes the problem that the patch describes, and also > > > > > clearly matches up with the Fixes tag. So that's correct. > > > > > > > > It is a really confusing though, why not just always return -EEXIST > > > > here? > > > > > > Because in current code GUP handles -EEXIST and -EFAULT differently? > > > > That has nothing to do with here. We shouldn't be deciding what the > > top layer does way down here. Return the correct error code for what > > was discovered at this layer the upper loop should make the decision > > what it should do > > > > > We do early bail out on -EFAULT. -EEXIST was first introduced in 2015 from > > > Kirill for not failing some mlock() or mmap(MAP_POPULATE) on dax (1027e4436b6). > > > Then in 2017 it got used again with pud-sized thp (a00cc7d9dd93d) on dax too. > > > They seem to service the same goal and it seems to be designed that -EEXIST > > > shouldn't fail GUP immediately. > > > > It must fail GUP immeidately if there is a pages list. > > Right, but my point is we don't have an user at all for follow_page_mask() > returning -EEXIST with a **page which is non-NULL. Or did I miss > it? We don't, but that isn't the point - it is about understandability not utility. There are only two call chains that involve follow_page_mask(), one always wants the -EEXIST and the other wants -EEXIST sometimes and sometimes wants to return -EFAULT for -EEXIST The solution is not to change follow_page_mask() but to be consistent throughout that -EEXIST means we encountered a populated but non-struct page PTE and the single place that wants -EEXIST to be reported as -EFAULT (__get_user_pages) should make that transformation. For instance, should we return -EEXIST in other cases like devmap and more that have PTEs present but are not returnable? It is already really confusing (and probably wrong) that we sometimes return NULL for populated PTEs. NULL causes faultin_page() to be called on something we already know is populated, seems like nonsense. I certainly don't want to see every place like that doing some if to transform -EEXIST into -EFAULT. > > Callers that want an early failure must pass in NULL for pages, it is > > just that simple. It has nothing to do with the FOLL flags. > > > > A WARN_ON would be appropriate to compare the FOLL flags against the > > pages. eg FOLL_GET without a pages is nonsense and should be > > immediately aborted. On the other hand, we avoid this by construction > > internal to gup.c > > We have something like that already, although it's only a VM_BUG_ON() not a > BUG_ON() or WARN_ON() at the entry of __get_user_pages(): > > VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN))); Right > I believe this works too and I think I get your point, but as stated above it's > just not used yet so the path is not useful to any real code path. It is not about useful, it is about understandable and consistency.. There should be clear rules about when and what errno to return in every case, we should trend in that direction. Jason