Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > filesystems right now. Dave Howells' netfs infrastructure is trying > > to solve the problem for everyone (and he's been looking at iomap as > > inspiration for what he's doing). > > Btw, I never understod why the network file systems don't just use > iomap. There is nothing block specific in the core iomap code. It calls creates and submits bio structs all over the place. This seems to require a blockdev. Anyway, netfs lib supports, or hopefully will support in the future, the following: (1) Fscache. netfslib will construct a read you're asking for from cached data and data from the server and stitch them together (where a folio may comprise pieces from more than once source), and then write the bits it read from the server out to the cache... And handle content encryption for you such that the data stored in the cache is content-encrypted. On writeback, the dirty data must be written to both the cache (if you have one) and the server (if you're not in disconnected operation). (2) Disconnected operation. netfslib will, in the future, handle storing data and changes in the cache and then sync'ing on reconnection of an object. (3) I want to hand persistent (for the life of an op) iov_iters to the filesystem so that the filesystem can, if it wants to, pass these to the kernel_sendmsg() and kernel_recvmsg() in the bottom. The aim is to get knowledge of pages out of the network filesystem entirely. A network filesystem would then provide two basic hooks to the server: async direct read and as async direct write. netfslib will use these to access the pagecache on behalf of the filesystem. (4) Reads and writes might want to/need to be non-block-size aligned. If we have a byte-range file lock, for example, or if we have a max block size (eg. rsize/wsize) set that's not a multiple of 512, say. (5) Compressed I/O. You get back more data than you asked for and you want to paste the rest into the pagecache (if buffered) or discard it (if DIO). Further, to make this work on write, we may need to hold on to pages on the sides of the one we modified to make sure we keep the right size blob of data to recompress and send back. (6) Larger cache block granularity. One thing I want to explore is the ability to have blocks in the cache that are larger than PAGE_SIZE. If I can't use the backing filesystem's knowledge of holes in a file, then I have to store my own metadata (ie. effectively build a filesystem on top of a filesystem). To reduce that amount of metadata that I need, I can make the cache granule size larger. In both 5 and 6, netfslib gets to tell the VM layer to increase the size of the blob in readahead() - and then may have to forcibly keep the pages surrounding the page of interest if it gets modified in order to be able to write to the cache correctly, depending on how much integrity I want to try and keep in the cache. (7) Not-quite-direct-I/O. cifs, for example, has a number of variations on read and write modes that are kind of but not quite direct I/O. David