On 12/4/18 3:03 PM, Dan Williams wrote: > On Tue, Dec 4, 2018 at 1:56 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote: >> >> On 12/4/18 12:28 PM, Dan Williams wrote: >>> On Mon, Dec 3, 2018 at 4:17 PM <john.hubbard@xxxxxxxxx> wrote: >>>> >>>> From: John Hubbard <jhubbard@xxxxxxxxxx> >>>> >>>> Introduces put_user_page(), which simply calls put_page(). >>>> This provides a way to update all get_user_pages*() callers, >>>> so that they call put_user_page(), instead of put_page(). >>>> >>>> Also introduces put_user_pages(), and a few dirty/locked variations, >>>> as a replacement for release_pages(), and also as a replacement >>>> for open-coded loops that release multiple pages. >>>> These may be used for subsequent performance improvements, >>>> via batching of pages to be released. >>>> >>>> This is the first step of fixing the problem described in [1]. The steps >>>> are: >>>> >>>> 1) (This patch): provide put_user_page*() routines, intended to be used >>>> for releasing pages that were pinned via get_user_pages*(). >>>> >>>> 2) Convert all of the call sites for get_user_pages*(), to >>>> invoke put_user_page*(), instead of put_page(). This involves dozens of >>>> call sites, and will take some time. >>>> >>>> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >>>> implement tracking of these pages. This tracking will be separate from >>>> the existing struct page refcounting. >>>> >>>> 4) Use the tracking and identification of these pages, to implement >>>> special handling (especially in writeback paths) when the pages are >>>> backed by a filesystem. Again, [1] provides details as to why that is >>>> desirable. >>> >>> I thought at Plumbers we talked about using a page bit to tag pages >>> that have had their reference count elevated by get_user_pages()? That >>> way there is no need to distinguish put_page() from put_user_page() it >>> just happens internally to put_page(). At the conference Matthew was >>> offering to free up a page bit for this purpose. >>> >> >> ...but then, upon further discussion in that same session, we realized that >> that doesn't help. You need a reference count. Otherwise a random put_page >> could affect your dma-pinned pages, etc, etc. > > Ok, sorry, I mis-remembered. So, you're effectively trying to capture > the end of the page pin event separate from the final 'put' of the > page? Makes sense. > Yes, that's it exactly. >> I was not able to actually find any place where a single additional page >> bit would help our situation, which is why this still uses LRU fields for >> both the two bits required (the RFC [1] still applies), and the dma_pinned_count. > > Except the LRU fields are already in use for ZONE_DEVICE pages... how > does this proposal interact with those? Very badly: page->pgmap and page->hmm_data both get corrupted. Is there an entire use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? Said another way: is it reasonable to disallow calling get_user_pages() on ZONE_DEVICE pages? If we have to support get_user_pages() on ZONE_DEVICE pages, then the whole LRU field approach is unusable. thanks, -- John Hubbard NVIDIA