> On Aug 18, 2022, at 2:31 PM, Anna Schumaker <anna@xxxxxxxxxx> wrote: > > Assuming nobody else has started working on the sparse-read function, > I would like to check my understanding of what it would look like: > > - The function splice_directo_to_actor() takes a "struct splice_desc" > as an argument. The sparse read function would take something similar, > a "struct sparse_desc", which has callbacks used by the underlying > filesystem to tell the server to encode either a hole or data segment > next. > > In the default case, the VFS tells the server to encode the entire > read range as data. If underlying filesystems opt-in, then whenever a > data extent is found the encode_data_segment() function is called and > whenever a hole is found it calls the encode_hole_segment() function. > > The NFS server would need to know file offset and length of each > segment, so these would probably be arguments to both functions. How > would reading data work, though? The server needs to reserve space in > the xdr stream for the offset & length metadata, but also the data > itself. I'm assuming it would do something similar to > fs/nfsd/vfs.c:nfsd_readv() and set up an iov_iter or kvec that the > underlying filesystem can use to place file data? > > Does that sound about right? Am I missing anything? I was expecting something more like what Dave originally described: >> IOWs, it seems to me that what READ_PLUS really wants is a "sparse >> read operation" from the filesystem rather than the current "read >> that fills holes with zeroes". i.e. a read operation that sets an >> iocb flag like RWF_SPARSE_READ to tell the filesystem to trim the >> read to just the ranges that contain data. >> >> That way the read populates the page cache over a single contiguous >> range of data and returns with the {offset, len} that spans the >> range that is read and mapped. The caller can then read that region >> out of the page cache and mark all the non-data regions as holes in >> whatever manner they need to. >> >> The iomap infrastructure that XFS and other filesystems use provide >> this exact "map only what contains data" capability - an iomap tells >> the page cache exactly what underlies the data range (hole, data, >> unwritten extents, etc) in an efficient manner, so it wouldn't be a >> huge stretch just to limit read IO ranges to those that contain only >> DATA extents. >> >> At this point READ_PLUS then just needs to iterate doing sparse >> reads and recording the ranges that return data as vector of some >> kind that is then passes to the encoding function to encode it as >> a sparse READ_PLUS data range.... But it's not clear how this would be made atomic, so maybe the callback mechanism you described is necessary. To make some progress on moving READ_PLUS out of EXPERIMENTAL status, can you build a READ_PLUS implementation that always returns a single CONTENT_DATA segment? We can merge that straight away, and it should perform on par with READ immediately. Then the bells and whistles can be added over time. -- Chuck Lever