On Thu, Sep 15, 2022 at 10:16:25AM +0200, Jan Kara wrote: > > How would that work? What protects the area where you want to avoid running > > into pinned pages from previously acceptable page getting pinned? If "they > > must have been successfully unmapped" is a part of what you are planning, we > > really do have a problem... > > But this is a very good question. So far the idea was that we lock the > page, unmap (or writeprotect) the page, and then check pincount == 0 and > that is a reliable method for making sure page data is stable (until we > unlock the page & release other locks blocking page faults and writes). But > once suddently ordinary page references can be used to create pins this > does not work anymore. Hrm. > > Just brainstorming ideas now: So we'd either need to obtain the pins early > when we still have the virtual address (but I guess that is often not > practical but should work e.g. for normal direct IO path) or we need some > way to "simulate" the page fault when pinning the page, just don't map it > into page tables in the end. This simulated page fault could be perhaps > avoided if rmap walk shows that the page is already mapped somewhere with > suitable permissions. OK. As far as I can see, the rules are along the lines of * creator of ITER_BVEC/ITER_XARRAY is responsible for pages being safe. That includes * page known to be locked by caller * page being privately allocated and not visible to anyone else * iterator being data source * page coming from pin_user_pages(), possibly as the result of iov_iter_pin_pages() on ITER_IOVEC/ITER_UBUF. * ITER_PIPE pages are always safe * pages found in ITER_BVEC/ITER_XARRAY are safe, since the iterator had been created with such. My preference would be to have iov_iter_get_pages() and friends pin if and only if we have data-destination iov_iter that is user-backed. For data-source user-backed we only need FOLL_GET, and for all other flavours (ITER_BVEC, etc.) we only do get_page(), if we need to grab any references at all. What I'd like to have is the understanding of the places where we drop the references acquired by iov_iter_get_pages(). How do we decide whether to unpin? E.g. pipe_buffer carries a reference to page and no way to tell whether it's a pinned one; results of iov_iter_get_pages() on ITER_IOVEC *can* end up there, but thankfully only from data-source (== WRITE, aka. ITER_SOURCE) iov_iter. So for those we don't care. Then there's nfs_request; AFAICS, we do need to pin the references in those if they are coming from nfs_direct_read_schedule_iovec(), but not if they come from readpage_async_filler(). How do we deal with coalescence, etc.? It's been a long time since I really looked at that code... Christoph, could you give any comments on that one? Note, BTW, that nfs_request coming from readpage_async_filler() have pages locked by caller; the ones from nfs_direct_read_schedule_iovec() do not, and that's where we want them pinned. Resulting page references end up (after quite a trip through data structures) stuffed into struct rpc_rqst ->rc_recv_buf.pages[] and when a response arrives from server, they get picked by xs_read_bvec() and fed to iov_iter_bvec(). In one case it's safe since the pages are locked; in another - since they would come from pin_user_pages(). The call chain at the time they are used has nothing to do with the originator - sunrpc is looking at the arrived response to READ that matches an rpc_rqst that had been created by sender of that request and safety is the sender's responsibility.