Re: [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance

David Howells <dhowells@xxxxxxxxxx> · Thu, 03 Nov 2022 14:51:10 +0000

Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

> > filesystems right now.  Dave Howells' netfs infrastructure is trying
> > to solve the problem for everyone (and he's been looking at iomap as
> > inspiration for what he's doing).
> 
> Btw, I never understod why the network file systems don't just use
> iomap.  There is nothing block specific in the core iomap code.

It calls creates and submits bio structs all over the place.  This seems to
require a blockdev.

Anyway, netfs lib supports, or hopefully will support in the future, the
following:

 (1) Fscache.  netfslib will construct a read you're asking for from cached
     data and data from the server and stitch them together (where a folio may
     comprise pieces from more than once source), and then write the bits it
     read from the server out to the cache...  And handle content encryption
     for you such that the data stored in the cache is content-encrypted.

     On writeback, the dirty data must be written to both the cache (if you
     have one) and the server (if you're not in disconnected operation).

 (2) Disconnected operation.  netfslib will, in the future, handle storing
     data and changes in the cache and then sync'ing on reconnection of an
     object.

 (3) I want to hand persistent (for the life of an op) iov_iters to the
     filesystem so that the filesystem can, if it wants to, pass these to the
     kernel_sendmsg() and kernel_recvmsg() in the bottom.

     The aim is to get knowledge of pages out of the network filesystem
     entirely.  A network filesystem would then provide two basic hooks to the
     server: async direct read and as async direct write.  netfslib will use
     these to access the pagecache on behalf of the filesystem.

 (4) Reads and writes might want to/need to be non-block-size aligned.  If we
     have a byte-range file lock, for example, or if we have a max block size
     (eg. rsize/wsize) set that's not a multiple of 512, say.

 (5) Compressed I/O.  You get back more data than you asked for and you want
     to paste the rest into the pagecache (if buffered) or discard it (if
     DIO).  Further, to make this work on write, we may need to hold on to
     pages on the sides of the one we modified to make sure we keep the right
     size blob of data to recompress and send back.

 (6) Larger cache block granularity.  One thing I want to explore is the
     ability to have blocks in the cache that are larger than PAGE_SIZE.  If I
     can't use the backing filesystem's knowledge of holes in a file, then I
     have to store my own metadata (ie. effectively build a filesystem on top
     of a filesystem).  To reduce that amount of metadata that I need, I can
     make the cache granule size larger.

     In both 5 and 6, netfslib gets to tell the VM layer to increase the size
     of the blob in readahead() - and then may have to forcibly keep the pages
     surrounding the page of interest if it gets modified in order to be able
     to write to the cache correctly, depending on how much integrity I want
     to try and keep in the cache.

 (7) Not-quite-direct-I/O.  cifs, for example, has a number of variations on
     read and write modes that are kind of but not quite direct I/O.

David