On 9/29/21 15:47, Linus Torvalds wrote:
On Wed, Sep 29, 2021 at 12:57 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
Now we have 3 callers of page_maybe_dma_pinned():
1. page_needs_cow_for_dma
2. pte_is_pinned
3. shrink_page_list
The 1st one is good as it takes the seqlock for write properly. The 2nd & 3rd
are missing, we may need to add them.
Well, the pte_is_pinned() case at least could do the seqlock in
clear_soft_dirty() - it has the vma and mm available.
That part seems straightforward, OK.
The page shrinker has always been problematic since it doesn't have
the vm (and by "always" I mean "modern times" - long ago we used to
scan virtually, in the days before rmap)
One option might be for fast-gup to give up on locked pages. I think
the page lock is the only thing that shrink_page_list() serializes
with.
In order to avoid locked pages in gup fast, it is easiest to do a
check for locked pages *after* fast-pinning them, and unpin them before
returning to the caller. This makes the change much smaller.
However, doing so leaves a window of time during which the pages are
still marked as maybe-dma-pinned, although those pages are never
returned to the caller as such. There is already code that is subject to
this in lockless_pages_from_mm(), for the case of a failed seqlock read.
I'm thinking it's probably OK, because the pages are not going to be
held long-term. They will be unpinned before returning from
lockless_pages_from_mm().
The counter argument is that this is merely making the race window
smaller, which is usually something that I argue against because it just
leads to harder-to-find bugs...
To be specific, here's what I mean:
diff --git a/mm/gup.c b/mm/gup.c
index 886d6148d3d03..8ba871a927668 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2657,6 +2657,7 @@ static unsigned long lockless_pages_from_mm(unsigned long start,
{
unsigned long flags;
int nr_pinned = 0;
+ int i;
unsigned seq;
if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
@@ -2693,7 +2694,23 @@ static unsigned long lockless_pages_from_mm(unsigned long start,
unpin_user_pages(pages, nr_pinned);
return 0;
}
+
+ /*
+ * Avoiding locked pages, in this fast/lockless context, will
+ * avoid interfering with shrink_page_list(), in particular.
+ * Give up upon finding the first locked page, but keep the
+ * earlier pages, so that slow gup does not have to re-pin them.
+ */
+ for (i = 0; i < nr_pinned; i++) {
+ if (PageLocked(pages[i])) {
+ unpin_user_pages(&pages[i], nr_pinned - i);
+ nr_pinned = i + 1;
+ break;
+ }
+ }
}
+
+
return nr_pinned;
}
thanks,
--
John Hubbard
NVIDIA