Re: [PATCH v4 1/1] mm: introduce put_user_page*(), placeholder versions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/19/2019 3:45 PM, Jerome Glisse wrote:
On Tue, Mar 19, 2019 at 03:43:44PM -0500, Tom Talpey wrote:
On 3/19/2019 4:03 AM, Ira Weiny wrote:
On Tue, Mar 19, 2019 at 04:36:44PM +0100, Jan Kara wrote:
On Tue 19-03-19 17:29:18, Kirill A. Shutemov wrote:
On Tue, Mar 19, 2019 at 10:14:16AM -0400, Jerome Glisse wrote:
On Tue, Mar 19, 2019 at 09:47:24AM -0400, Jerome Glisse wrote:
On Tue, Mar 19, 2019 at 03:04:17PM +0300, Kirill A. Shutemov wrote:
On Fri, Mar 08, 2019 at 01:36:33PM -0800, john.hubbard@xxxxxxxxx wrote:
From: John Hubbard <jhubbard@xxxxxxxxxx>

[...]

diff --git a/mm/gup.c b/mm/gup.c
index f84e22685aaa..37085b8163b1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -28,6 +28,88 @@ struct follow_page_context {
   	unsigned int page_mask;
   };
+typedef int (*set_dirty_func_t)(struct page *page);
+
+static void __put_user_pages_dirty(struct page **pages,
+				   unsigned long npages,
+				   set_dirty_func_t sdf)
+{
+	unsigned long index;
+
+	for (index = 0; index < npages; index++) {
+		struct page *page = compound_head(pages[index]);
+
+		if (!PageDirty(page))
+			sdf(page);

How is this safe? What prevents the page to be cleared under you?

If it's safe to race clear_page_dirty*() it has to be stated explicitly
with a reason why. It's not very clear to me as it is.

The PageDirty() optimization above is fine to race with clear the
page flag as it means it is racing after a page_mkclean() and the
GUP user is done with the page so page is about to be write back
ie if (!PageDirty(page)) see the page as dirty and skip the sdf()
call while a split second after TestClearPageDirty() happens then
it means the racing clear is about to write back the page so all
is fine (the page was dirty and it is being clear for write back).

If it does call the sdf() while racing with write back then we
just redirtied the page just like clear_page_dirty_for_io() would
do if page_mkclean() failed so nothing harmful will come of that
neither. Page stays dirty despite write back it just means that
the page might be write back twice in a row.

Forgot to mention one thing, we had a discussion with Andrea and Jan
about set_page_dirty() and Andrea had the good idea of maybe doing
the set_page_dirty() at GUP time (when GUP with write) not when the
GUP user calls put_page(). We can do that by setting the dirty bit
in the pte for instance. They are few bonus of doing things that way:
      - amortize the cost of calling set_page_dirty() (ie one call for
        GUP and page_mkclean()
      - it is always safe to do so at GUP time (ie the pte has write
        permission and thus the page is in correct state)
      - safe from truncate race
      - no need to ever lock the page

Extra bonus from my point of view, it simplify thing for my generic
page protection patchset (KSM for file back page).

So maybe we should explore that ? It would also be a lot less code.

Yes, please. It sounds more sensible to me to dirty the page on get, not
on put.

I fully agree this is a desirable final state of affairs.

I'm glad to see this presented because it has crossed my mind more than once
that effectively a GUP pinned page should be considered "dirty" at all times
until the pin is removed.  This is especially true in the RDMA case.

But, what if the RDMA registration is readonly? That's not uncommon, and
marking dirty unconditonally would add needless overhead to such pages.

Yes and this is only when FOLL_WRITE is set ie when you are doing GUP and
asking for write. Doing GUP and asking for read is always safe.

Aha, ok great.

I guess it does introduce something for callers to be aware of, if
they GUP very large regions. I suppose if they're sufficiently aware
of the situation, e.g. pnfs LAYOUT_COMMIT notifications, they could
walk lists and reset page_dirty for untouched pages before releasing.
That's their issue though, and agreed it's safest for the GUP layer
to mark.

Tom.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux