On Mon, Aug 07, 2017 at 10:25:02AM +1000, Dave Chinner wrote: > We've always told people not to do those "horrible abuses" because > of the TOCTOU race conditions inherent in getting accurate > BMAP/FIEMAP information to userspace. However, immutable extent maps > solve the TOCTOU problem and so removes the only *technical* barrier > in the way of using extent maps to implement functionality such as > userspace pNFS servers. For pNFS block/scsi and my upcoming RDMA persistent memory layout? Hell no - we'll need concepts we can't expose to userspace for them, and to expose the advanced functionality people are asking for (reflinks, atomic updates, no stale data exposure) immutable extents maps won't work at all. > The core requirement for a userspace pNFS block server to be able to > safely export the block map of a file to remote clients is that the > extent map is allocated and will not change while the client has > been granted access to it. No. The core feature for the block layout is to create an unwrittent extent that we can expose to the client for writing to it and only marking it as written after commit by converting the extent list. Now I know you're going to argue that this could work with pre-zeroing the extents, but for and actual SCSI or NVMe device that will suck badly. And for RDMA-like layouts we don't even need the zeroing as we can control client behavior a lot better because memory registrations allow much more fine grained control. Either way we a good notification from the file system to the server when the extent map changes. But for either blocks or rdma layout and implementation with the filesystem in kernel space and the server in user is stupid as they need to interact closely. There is a good reason why all successful NFS products have the server very tightly coupled to the file system, and a userspace <-> kernel barrier does not help with that.