Re: [PATCH v3 6/6] NFSD: Repeal and replace the READ_PLUS implementation

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 19 Aug 2022 15:18:21 +0000

> On Aug 18, 2022, at 2:31 PM, Anna Schumaker <anna@xxxxxxxxxx> wrote:
> 
> Assuming nobody else has started working on the sparse-read function,
> I would like to check my understanding of what it would look like:
> 
> - The function splice_directo_to_actor() takes a "struct splice_desc"
> as an argument. The sparse read function would take something similar,
> a "struct sparse_desc", which has callbacks used by the underlying
> filesystem to tell the server to encode either a hole or data segment
> next.
> 
> In the default case, the VFS tells the server to encode the entire
> read range as data. If underlying filesystems opt-in, then whenever a
> data extent is found the encode_data_segment() function is called and
> whenever a hole is found it calls the encode_hole_segment() function.
> 
> The NFS server would need to know file offset and length of each
> segment, so these would probably be arguments to both functions. How
> would reading data work, though? The server needs to reserve space in
> the xdr stream for the offset & length metadata, but also the data
> itself. I'm assuming it would do something similar to
> fs/nfsd/vfs.c:nfsd_readv() and set up an iov_iter or kvec that the
> underlying filesystem can use to place file data?
> 
> Does that sound about right? Am I missing anything?

I was expecting something more like what Dave originally described:

>> IOWs, it seems to me that what READ_PLUS really wants is a "sparse
>> read operation" from the filesystem rather than the current "read
>> that fills holes with zeroes". i.e. a read operation that sets an
>> iocb flag like RWF_SPARSE_READ to tell the filesystem to trim the
>> read to just the ranges that contain data.
>> 
>> That way the read populates the page cache over a single contiguous
>> range of data and returns with the {offset, len} that spans the
>> range that is read and mapped. The caller can then read that region
>> out of the page cache and mark all the non-data regions as holes in
>> whatever manner they need to.
>> 
>> The iomap infrastructure that XFS and other filesystems use provide
>> this exact "map only what contains data" capability - an iomap tells
>> the page cache exactly what underlies the data range (hole, data,
>> unwritten extents, etc) in an efficient manner, so it wouldn't be a
>> huge stretch just to limit read IO ranges to those that contain only
>> DATA extents.
>> 
>> At this point READ_PLUS then just needs to iterate doing sparse
>> reads and recording the ranges that return data as vector of some
>> kind that is then passes to the encoding function to encode it as
>> a sparse READ_PLUS data range....

But it's not clear how this would be made atomic, so maybe the
callback mechanism you described is necessary.

To make some progress on moving READ_PLUS out of EXPERIMENTAL
status, can you build a READ_PLUS implementation that always
returns a single CONTENT_DATA segment? We can merge that straight
away, and it should perform on par with READ immediately.

Then the bells and whistles can be added over time.

--
Chuck Lever