On Mon, Jun 18, 2018 at 12:31 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Mon, Jun 18, 2018 at 12:21:46PM -0700, Dan Williams wrote: >> On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard <jhubbard@xxxxxxxxxx> wrote: >> > On 06/18/2018 10:56 AM, Dan Williams wrote: >> >> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard <jhubbard@xxxxxxxxxx> wrote: >> >>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote: >> >>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote: >> >>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that >> >>>>> these pages are going to be treated specially. In particular, the caller >> >>>>> does not really want or need to support certain file operations, while the >> >>>>> page is flagged this way. >> >>>>> >> >>>>> If necessary, we could add a new API call. >> >>>> >> >>>> That API call is called get_user_pages_longterm. >> >>> >> >>> OK...I had the impression that this was just semi-temporary API for dax, but >> >>> given that it's an exported symbol, I guess it really is here to stay. >> >> >> >> The plan is to go back and provide api changes that bypass >> >> get_user_page_longterm() for RDMA. However, for VFIO and others, it's >> >> not clear what we could do. In the VFIO case the guest would need to >> >> be prepared handle the revocation. >> > >> > OK, let's see if I understand that plan correctly: >> > >> > 1. Change RDMA users (this could be done entirely in the various device drivers' >> > code, unless I'm overlooking something) to use mmu notifiers, and to do their >> > DMA to/from non-pinned pages. >> >> The problem with this approach is surprising the RDMA drivers with >> notifications of teardowns. It's the RDMA userspace applications that >> need the notification, and it likely needs to be explicit opt-in, at >> least for the non-ODP drivers. > > Well, more than that, we have no real plan on how to accomplish this, > or any idea if it can even really work.. Most userspace give up > control of the memory lifetime to the remote side of the connection > and have no way to recover it other than a full teardown. > > Given that John is trying to fix a kernel oops, I don't think we > should tie progress on it to the RDMA notification idea. > > .. and given that John is trying to fix a kernel oops, maybe the > weird/bad/ugly behavior of ftruncte is a better bug to have than for > unprivileged users to be able to oops the kernel??? Trading one bug for another is not a fix. We did not fix the DAX-dma-vs-trruncate bug by breaking truncate() guarantees.