On Wed, Aug 28, 2024 at 08:34:00PM +0100, Matthew Wilcox wrote: > Today it is the responsibility of each filesystem to maintain the mapping > from file logical addresses to disk blocks (*). There are various ways > to query that information, eg calling get_block() or using iomap. > > What if we pull that information up into the VFS? Filesystems obviously > _control_ that information, so need to be able to invalidate entries. > And we wouldn't want to store all extents in the VFS all the time, so > would need to have a way to call into the filesystem to populate ranges > of files. We'd need to decide how to lock/protect that information > -- a per-file lock? A per-extent lock? No locking, just a seqcount? > We need a COW bit in the extent which tells the user that this extent > is fine for reading through, but if there's a write to be done then the > filesystem needs to be asked to create a new extent. > > There are a few problems I think this can solve. One is efficient > implementation of NFS READPLUS. To expand on this, we're talking about the Linux NFS server's implementation of the NFSv4.2 READ_PLUS operation, which is specified here: https://www.rfc-editor.org/rfc/rfc7862.html#section-15.10 The READ_PLUS operation can return an array of content segments that include regular data, holes in the file, or data patterns. Knowing how the filesystem stores a file would help NFSD identify where it can return a representation of a hole rather than a string of actual zeroes, for instance. > Another is the callback from iomap > to the filesystem when doing buffered writeback. A third is having a > common implementation of FIEMAP. I've heard rumours that FUSE would like > something like this, and maybe there are other users that would crop up. > > Anyway, this is as far as my thinking has got on this topic for now. > Maybe there's a good idea here, maybe it's all a huge overengineered mess > waiting to happen. I'm sure other people know this area of filesystems > better than I do. > > (*) For block device filesystems. Obviously network filesystems and > synthetic filesystems don't care and can stop reading now. Umm, unless > maybe they _want_ to use it, eg maybe there's a sharded thing going on and > the fs wants to store information about each shard in the extent cache? > -- Chuck Lever