Re: [PATCH v2 1/5] fs, xfs: introduce S_IOMAP_IMMUTABLE

Christoph Hellwig <hch@xxxxxx> · Fri, 11 Aug 2017 12:34:10 +0200

On Mon, Aug 07, 2017 at 10:25:02AM +1000, Dave Chinner wrote:
> We've always told people not to do those "horrible abuses" because
> of the TOCTOU race conditions inherent in getting accurate
> BMAP/FIEMAP information to userspace. However, immutable extent maps
> solve the TOCTOU problem and so removes the only *technical* barrier
> in the way of using extent maps to implement functionality such as
> userspace pNFS servers.

For pNFS block/scsi and my upcoming RDMA persistent memory layout?
Hell no - we'll need concepts we can't expose to userspace for them,
and to expose the advanced functionality people are asking for
(reflinks, atomic updates, no stale data exposure) immutable extents
maps won't work at all.

> The core requirement for a userspace pNFS block server to be able to
> safely export the block map of a file to remote clients is that the
> extent map is allocated and will not change while the client has
> been granted access to it.

No.  The core feature for the block layout is to create an unwrittent
extent that we can expose to the client for writing to it and only
marking it as written after commit by converting the extent list.

Now I know you're going to argue that this could work with pre-zeroing
the extents, but for and actual SCSI or NVMe device that will suck
badly.  And for RDMA-like layouts we don't even need the zeroing as
we can control client behavior a lot better because memory registrations
allow much more fine grained control.

Either way we a good notification from the file system to the server
when the extent map changes.

But for either blocks or rdma layout and implementation with the filesystem
in kernel space and the server in user is stupid as they need to interact
closely.  There is a good reason why all successful NFS products have
the server very tightly coupled to the file system, and a userspace <->
kernel barrier does not help with that.