Re: Possible race with page_maybe_dma_pinned?

Jan Kara <jack@xxxxxxx> · Thu, 30 Sep 2021 13:11:20 +0200

On Wed 29-09-21 18:57:33, John Hubbard wrote:
> On 9/29/21 15:47, Linus Torvalds wrote:
> > On Wed, Sep 29, 2021 at 12:57 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > 
> > > Now we have 3 callers of page_maybe_dma_pinned():
> > > 
> > >          1. page_needs_cow_for_dma
> > >          2. pte_is_pinned
> > >          3. shrink_page_list
> > > 
> > > The 1st one is good as it takes the seqlock for write properly.  The 2nd & 3rd
> > > are missing, we may need to add them.
> > 
> > Well, the pte_is_pinned() case at least could do the seqlock in
> > clear_soft_dirty() - it has the vma and mm available.
> 
> That part seems straightforward, OK.
> 
> > 
> > The page shrinker has always been problematic since it doesn't have
> > the vm (and by "always" I mean "modern times" - long ago we used to
> > scan virtually, in the days before rmap)
> > 
> > One option might be for fast-gup to give up on locked pages. I think
> > the page lock is the only thing that shrink_page_list() serializes
> > with.
> > 
> 
> In order to avoid locked pages in gup fast, it is easiest to do a
> check for locked pages *after* fast-pinning them, and unpin them before
> returning to the caller. This makes the change much smaller.
> 
> However, doing so leaves a window of time during which the pages are
> still marked as maybe-dma-pinned, although those pages are never
> returned to the caller as such. There is already code that is subject to
> this in lockless_pages_from_mm(), for the case of a failed seqlock read.
> I'm thinking it's probably OK, because the pages are not going to be
> held long-term. They will be unpinned before returning from
> lockless_pages_from_mm().
> 
> The counter argument is that this is merely making the race window
> smaller, which is usually something that I argue against because it just
> leads to harder-to-find bugs...

Yeah, what you propose actually does not guarantee that the reclaim and
page pinning cannot race in a way that the page gets unmapped and
gup_fast() returns a pinned page. Which is what the code in
shrink_page_list() tries to prevent AFAIU.

The only place where I can see us doing a sanely race-free check in
shrink_page_list() path is inside try_to_unmap() - we could implement
unmap_if_not_pinned semantics and inside the rmap walk we can bump up the
seqcounts (it even conceptually makes some sense) to make sure
page_maybe_dma_pinned() check we'd do during the rmap walk is not racing
with pup_fast(). By the time we leave the seqcount critical section, the
page will be unmapped so we can be sure there will be no new pins of the
page.

								Honza
> 
> To be specific, here's what I mean:
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 886d6148d3d03..8ba871a927668 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2657,6 +2657,7 @@ static unsigned long lockless_pages_from_mm(unsigned long start,
>  {
>  	unsigned long flags;
>  	int nr_pinned = 0;
> +	int i;
>  	unsigned seq;
> 
>  	if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
> @@ -2693,7 +2694,23 @@ static unsigned long lockless_pages_from_mm(unsigned long start,
>  			unpin_user_pages(pages, nr_pinned);
>  			return 0;
>  		}
> +
> +		/*
> +		 * Avoiding locked pages, in this fast/lockless context, will
> +		 * avoid interfering with shrink_page_list(), in particular.
> +		 * Give up upon finding the first locked page, but keep the
> +		 * earlier pages, so that slow gup does not have to re-pin them.
> +		 */
> +		for (i = 0; i < nr_pinned; i++)  {
> +			if (PageLocked(pages[i])) {
> +				unpin_user_pages(&pages[i], nr_pinned - i);
> +				nr_pinned = i + 1;
> +				break;
> +			}
> +		}
>  	}
> +
> +
>  	return nr_pinned;
>  }
> 
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR