On Thu, Oct 12, 2017 at 11:57 PM, Christoph Hellwig <hch@xxxxxx> wrote: > On Thu, Oct 12, 2017 at 10:41:39AM -0700, Dan Williams wrote: >> So, you're jumping into this review at v9 where I've split the patches >> that take an initial MAP_DIRECT lease out from the patches that take >> FL_LAYOUT leases at memory registration time. You can see a previous >> attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace >> flush" which should be in your inbox. > > The point is that your problem has absolutely nothing to do with mmap, > and all with get_user_pages. > > get_user_pages on DAX doesn't give the same guarantees as on pagecache > or anonymous memory, and that is the prbolem we need to fix. In fact > I'm pretty sure if we try hard enough (and we might have to try > very hard) we can see the same problem with plain direct I/O and without > any RDMA involved, e.g. do a larger direct I/O write to memory that is > mmap()ed from a DAX file, then truncate the DAX file and reallocate > the blocks, and we might corrupt that new file. We'll probably need > a special setup where there is little other chance but to reallocate > those used blocks. I'll take a harder look at this... > So what we need to do first is to fix get_user_pages vs unmapping > DAX mmap()ed blocks, be that from a hole punch, truncate, COW > operation, etc. > > Then we need to look into the special case of a long-living non-transient > get_user_pages that RDMA does - we can't just reject any truncate or > other operation for that, so that's where something like me layout > lease suggestion comes into play - but the call that should get the > least is not the mmap - it's the memory registration call that does > the get_user_pages. Yes, mmap is not the place to get the lease for a later get_user_pages, and my patches do take an additional lease at get_user_pages / MR init time. However, the mmap call has the file-descriptor for SIGIO the MR-init call does not. If we delay all of the setup it to MR time then we need to invent a notification scheme specific to RDMA which seems like a waste to me when we can generically signal an event on the fd for any event that effects any of the vma's on the file. The FL_LAYOUT lease impacts the entire file, so as far as I can see delaying the notification until MR-init is too late, too granular, and too RDMA specific.