On Fri, Oct 13, 2017 at 10:01 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Fri, Oct 13, 2017 at 08:56:10AM -0700, Dan Williams wrote: >> While implementing MAP_DIRECT, an mmap flag that arranges for an >> FL_LAYOUT lease to be established, Al noted: >> >> You are not even guaranteed that descriptor will remain be still >> open by the time you pass it down to your helper, nevermind the >> moment when event actually happens... >> >> The first problem can be solved with an fd{get,put} at mmap >> {entry,exit}. > > Huh? fdget() does *NOT* guarantee that descriptor won't get closed. What > it does is guarantee that struct file won't get closed under you, which > is nowhere near the same thing. And while we are at it, it certainly > _is_ called by mmap()... > >> The second problem appears to be a general issue. >> >> Leases follow the lifetime of the inode, so it is possible for a lease >> to be broken after the file is closed. When that happens userspace may >> get a notification on a stale fd. Of course it is not recommended that a >> process close a file descriptor with an active lease, but if it does we >> should assume that the notification is not needed either. Walk leases at >> close time and invalidate any pending fasync instances. > > What the hell is special about close(2) and not, e.g. dup2(2)? Or execve(2) > triggering close-on-exec, etc... Besides, you are changing a user-visible > behaviour here. Suppose your process forks and the child closes all > descriptors; should that stop SIGIO delivery to the parent? > > Let's step back for a minute; could you describe how the userland is supposed > to use that thing? MAP_DIRECT is a meant as a way to safely pass DAX mappings of a file to the RDMA sub-system, or any sub-system that follows a memory registration design pattern. RDMA expects that once it has done get_user_pages() that it has exclusive access to the memory backing the file mapping indefinitely. With page cache backed file mappings we can truncate and hole punch the file at will and the RDMA operations will continue to pages that are no longer part of the file. Yes, that breaks coherency, but it otherwise does not cause damage to unrelated file blocks. With DAX we do not have the luxury of an indirect page for the RDMA to land the operations are going straight to file blocks in persistent memory. With MAP_DIRECT the proposal is that when the RDMA memory registration code sees 'vma_is_dax(vma) == true' it calls a new ->lease_direct() vm_operation to take an FL_LAYOUT lease against the file to protect against truncate / fallocate. Lease expiration triggers a callback to redirect or shutdown RDMA. The filesystem mmap implemantation also arranges for an FL_LAYOUT lease to be taken at mmap time when the fd is available to setup a SIGIO notification. If we don't take a lease at mmap time then we would need to develop a notification mechanism that is specific to the RDMA code, and using SIGIO on the mmap fd seemed a more generic solution to me.