On 06/18/2018 12:21 PM, Dan Williams wrote: > On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard <jhubbard@xxxxxxxxxx> wrote: >> On 06/18/2018 10:56 AM, Dan Williams wrote: >>> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard <jhubbard@xxxxxxxxxx> wrote: >>>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote: >>>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote: >>>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that >>>>>> these pages are going to be treated specially. In particular, the caller >>>>>> does not really want or need to support certain file operations, while the >>>>>> page is flagged this way. >>>>>> >>>>>> If necessary, we could add a new API call. >>>>> >>>>> That API call is called get_user_pages_longterm. >>>> >>>> OK...I had the impression that this was just semi-temporary API for dax, but >>>> given that it's an exported symbol, I guess it really is here to stay. >>> >>> The plan is to go back and provide api changes that bypass >>> get_user_page_longterm() for RDMA. However, for VFIO and others, it's >>> not clear what we could do. In the VFIO case the guest would need to >>> be prepared handle the revocation. >> >> OK, let's see if I understand that plan correctly: >> >> 1. Change RDMA users (this could be done entirely in the various device drivers' >> code, unless I'm overlooking something) to use mmu notifiers, and to do their >> DMA to/from non-pinned pages. > > The problem with this approach is surprising the RDMA drivers with > notifications of teardowns. It's the RDMA userspace applications that > need the notification, and it likely needs to be explicit opt-in, at > least for the non-ODP drivers. > >> 2. Return early from get_user_pages_longterm, if the memory is...marked for >> RDMA? (How? Same sort of page flag that I'm floating here, or something else?) >> That would avoid the problem with pinned pages getting their buffer heads >> removed--by disallowing the pinning. Makes sense. > > Well, right now the RDMA workaround is DAX specific and it seems we > need to generalize it for the page-cache case. One thought is to have > try_to_unmap() take it's own reference and wait for the page reference > count to drop to one so that the truncate path knows the page is > dma-idle and disconnected from the page cache, but I have not looked > at the details. > >> Also, is there anything I can help with here, so that things can happen sooner? > > I do think we should explore a page flag for pages that are "long > term" pinned. Michal asked for something along these lines at LSF / MM > so that the core-mm can give up on pages that the kernel has lost > lifetime control. Michal, did I capture your ask correctly? OK, that "refcount == 1" approach sounds promising: -- still use a page flag, but narrow the scope to get_user_pages_longterm() pages -- just wait in try_to_unmap, instead of giving up I'll look into it, while waiting for Michal's thoughts on this.