Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems

David Disseldorp <ddiss@xxxxxxx> · Mon, 11 Sep 2023 15:35:15 +0200

Hi Dave,

On Mon, 11 Sep 2023 12:07:07 +1000, Dave Chinner wrote:

> On Sun, Sep 10, 2023 at 09:29:14PM -0400, Kent Overstreet wrote:
> > On Mon, Sep 11, 2023 at 11:05:09AM +1000, Dave Chinner wrote:  
> > > On Sat, Sep 09, 2023 at 06:42:30PM -0400, Kent Overstreet wrote:  
> > > > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:  
> > > > > So why can't we figure out that easier way? What's wrong with trying to
> > > > > figure out if we can do some sort of helper or library set that assists
> > > > > supporting and porting older filesystems. If we can do that it will not
> > > > > only make the job of an old fs maintainer a lot easier, but it might
> > > > > just provide the stepping stones we need to encourage more people climb
> > > > > up into the modern VFS world.  
> > > > 
> > > > What if we could run our existing filesystem code in userspace?  
> > > 
> > > You mean like lklfuse already enables?  
> > 
> > I'm not seeing that it does?
> > 
> > I just had a look at the code, and I don't see anything there related to
> > the VFS - AFAIK, a VFS -> fuse layer doesn't exist yet.  
> 
> Just to repeat what I said on #xfs here...
> 
> It doesn't try to cut in half way through the VFS -> filesystem
> path. It just redirects the fuse operations to "lkl syscalls" and so
> runs the entire kernel VFS->filesystem path.
> 
> https://github.com/lkl/linux/blob/master/tools/lkl/lklfuse.c
> 
> > And that looks a lot heavier than what we'd ideally want, i.e. a _lot_
> > more kernel code would be getting pulled in. The entire block layer,
> > probably the scheduler as well.  

The LKL block layer may also become useful for legacy storage support in
future, e.g. SCSI protocol obsolescence.

> Yes, but arguing that "performance sucks" misses the entire point of
> this discussion: that for the untrusted user mounts of untrusted
> filesystem images we already have a viable method for moving the
> dangerous processing out into userspace that requires almost *zero
> additional work* from anyone.

Indeed. Hajime and Octavian (cc'ed) have also made serious efforts to
get the LKL codebase in shape for mainline:
https://lore.kernel.org/linux-um/cover.1611103406.git.thehajime@xxxxxxxxx/

> As long as the performance of the lklfuse implementation doesn't
> totally suck, nobody will really care that much that isn't quite as
> fast as a native implementation. PLuggable drives (e.g. via USB) are
> already going to be much slower than a host installed drive, so I
> don't think performance is even really a consideration for these
> sorts of use cases....
> 
> > What I've got in bcachefs-tools is a much thinner mapping from e.g.
> > kthreads -> pthreads, block layer -> aio, etc.  
> 
> Right, and we've got that in userspace for XFS, too. If we really
> cared that much about XFS-FUSE, I'd be converting userspace to use
> ublk w/ io_uring on top of a port of the kernel XFS buffer cache as
> the basis for a performant fuse implementation. However, there's a
> massive amount of userspace work needed to get a native XFS FUSE
> implementation up and running (even ignoring performance), so it's
> just not a viable short-term - or even medium-term - solution to the
> current problems.
> 
> Indeed, if you do a fuse->fs ops wrapper, I'd argue that lklfuse is
> the place to do it so that there is a single code base that supports
> all kernel filesystems without requiring anyone to support a
> separate userspace code base. Requiring every filesystem to do their
> own FUSE ports and then support them doesn't reduce the overall
> maintenance overhead burden on filesystem developers....

LKL is still implemented as a non-mmu architecture. The only fs specific
downstream change that lklfuse depends on is non-mmu xfs_buf support:
https://lore.kernel.org/linux-xfs/1447800381-20167-1-git-send-email-octavian.purdila@xxxxxxxxx/

Does your lklfuse enthusiasm here imply that you'd be willing to
reconsider Octavian's earlier proposal for XFS non-mmu support?

Cheers, David