Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export

"NeilBrown" <neilb@xxxxxxx> · Wed, 18 Aug 2021 07:39:31 +1000

On Wed, 18 Aug 2021, kreijack@xxxxxxxxx wrote:
> On 8/15/21 11:53 PM, NeilBrown wrote:
> > On Mon, 16 Aug 2021, kreijack@xxxxxxxxx wrote:
> >> On 8/15/21 9:35 PM, Roman Mamedov wrote:

> >>
> >> However looking at the 'exports' man page, it seems that NFS has already an
> >> option to cover these cases: 'crossmnt'.
> >>
> >> If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
> >> exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
> >> the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.
> > 
> > As you acknowledged, you haven't read the whole back-story.  Maybe you
> > should.
> > 
> > https://lore.kernel.org/linux-nfs/20210613115313.BC59.409509F4@xxxxxxxxxxxx/
> > https://lore.kernel.org/linux-nfs/162848123483.25823.15844774651164477866.stgit@noble.brown/
> > https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.stgit@noble.brown/
> > 
> > The flow of conversation does sometimes jump between threads.
> > 
> > I'm very happy to respond you questions after you've absorbed all that.
> 
> Hi Neil,
> 
> I read the other threads.  And I still have the opinion that the nfsd
> crossmnt behavior should be a good solution for the btrfs subvolumes. 

Thanks for reading it all.  Let me join the dots for you.

"crossmnt" doesn't currently work because "subvolumes" aren't mount
points.

We could change btrfs so that subvolumes *are* mountpoints.  They would
have to be automounts.  I posted patches to do that.  They were broadly
rejected because people have many thousands of submounts that are
concurrently active and so /proc/mounts would be multiple megabytes is
size and working with it would become impractical.  Also, non-privileged
users can create subvols, and may want the path names to remain private.
But these subvols would appear in the mount table and so would no longer
be private.

Alternately we could change the "crossmnt" functionality to treat a
change of st_dev as though it were a mount point.  I posted patches to
do this too.  This hits the same sort of problems in a different way.
If NFSD reports that is has crossed a "mount" by providing a different
filesystem-id to the client, then the client will create a new mount
point which will appear in /proc/mounts.  It might be less likely that
many thousands of subvolumes are accessed over NFS than locally, but it
is still entirely possible.  I don't want the NFS client to suffer a
problem that btrfs doesn't impose locally.  And 'private' subvolumes
could again appear on a public list if they were accessed via NFS.

Thanks,
NeilBrown