Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 16 Jul 2021, Josef Bacik wrote:
> On 7/15/21 1:24 PM, Christoph Hellwig wrote:
> > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote:
> >> Because there's no alternative.  We need a way to tell userspace they've
> >> wandered into a different inode namespace.  There's no argument that what
> >> we're doing is ugly, but there's never been a clear "do X instead".  Just a
> >> lot of whinging that btrfs is broken.  This makes userspace happy and is
> >> simple and straightforward.  I'm open to alternatives, but there have been 0
> >> workable alternatives proposed in the last decade of complaining about it.
> > 
> > Make sure we cross a vfsmount when crossing the "st_dev" domain so
> > that it is properly reported.   Suggested many times and ignored all
> > the time beause it requires a bit of work.
> > 
> 
> You keep telling me this but forgetting that I did all this work when you 
> originally suggested it.  The problem I ran into was the automount stuff 
> requires that we have a completely different superblock for every vfsmount. 
> This is fine for things like nfs or samba where the automount literally points 
> to a completely different mount, but doesn't work for btrfs where it's on the 
> same file system.  If you have 1000 subvolumes and run sync() you're going to 
> write the superblock 1000 times for the same file system.  You are going to 
> reclaim inodes on the same file system 1000 times.  You are going to reclaim 
> dcache on the same filesytem 1000 times.  You are also going to pin 1000 
> dentries/inodes into memory whenever you wander into these things because the 
> super is going to hold them open.
> 
> This is not a workable solution.  It's not a matter of simply tying into 
> existing infrastructure, we'd have to completely rework how the VFS deals with 
> this stuff in order to be reasonable.  And when I brought this up to Al he told 
> me I was insane and we absolutely had to have a different SB for every vfsmount, 
> which means we can't use vfsmount for this, which means we don't have any other 
> options.  Thanks,

When I was first looking at this, I thought that separate vfsmnts
and auto-mounting was the way to go "just like NFS".  NFS still shares a
lot between the multiple superblock - certainly it shares the same
connection to the server.

But I dropped the idea when Bruce pointed out that nfsd is not set up to
export auto-mounted filesystems.  It needs to be able to find a
filesystem given a UUID (extracted from a filehandle), and it does this
by walking through the mount table to find one that matches.  So unless
all btrfs subvols were mounted all the time (which I wouldn't propose),
it would need major work to fix.

NFSv4 describes the fsid as having a "major" and "minor" component.
We've never treated these as having an important meaning - just extra
bits to encode uniqueness in.  Maybe we should have used "major" for the
vfsmnt, and kept "minor" for the subvol.....

The idea for a single vfsmnt exposing multiple inode-name-spaces does
appeal to me.  The "st_dev" is just part of the name, and already a
fairly blurry part.  Thanks to bind mounts, multiple mounts can have the
same st_dev.  I see no intrinsic reason that a single mount should not
have multiple fsids, provided that a coherent picture is provided to
userspace which doesn't contain too many surprises.

NeilBrown




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux