Re: [PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export

"NeilBrown" <neilb@xxxxxxx> · Mon, 13 Sep 2021 10:43:44 +1000

On Sun, 12 Sep 2021, Amir Goldstein wrote:
> > Maybe what we really need is for a bunch of diverse filesystem
> > developers to get together and agree on some new common interface for
> > subvolume management, including coming up with some sort of definition
> > of what a subvolume "is".
> 
> Neil,
> 
> Seeing that LSF/MM is not expected to gather in the foreseen future, would
> you like to submit this as a topic for discussion in LPC Filesystem MC [1]?
> I know this is last minute, but we've just extended the CFP deadline
> until Sep 15 (MC is on Sep 21), so if you post a proposal, I think we will
> be able to fit this session in the final schedule.

Thanks for the suggestion.  Maybe that is a good idea...  But I don't
personally find face-to-face interactions particularly useful - though
other people obviously do.  I need thinking time after receiving new
ideas, so I can be sure that I understand them properly.  Face-to-face
doesn't allow me that thinking time.

So: no, I won't be proposing anything for LPC.

> 
> Granted, I don't know how many of the stakeholders plan to attend
> the LPC Filesystem MC, but at least Josef should be there ;)
> 
> I do have one general question about the expected behavior -
> In his comment to the LWN article [2], Josef writes:
> 
> "The st_dev thing is unfortunate, but again is the result of a lack of
> interfaces.
>  Very early on we had problems with rsync wandering into snapshots and
>  copying loads of stuff. Find as well would get tripped up.
>  The way these tools figure out if they've wandered into another file system
>  is if the st_dev is different..."
> 
> If your plan goes through to export the main btrfs filesystem and
> subvolumes as a uniform st_dev namespace to the NFS client,
> what's to stop those old issues from remerging on NFS exported btrfs?

That comment from Josef was interesting.... It doesn't align with
Commit 3394e1607eaf ("Btrfs: Give each subvol and snapshot their own anonymous devid")
when Chris Mason introduced the per-subvol device number with the
justification that:
    Each subvolume has its own private inode number space, and so we need
    to fill in different device numbers for each subvolume to avoid confusing
    applications.

But I understand that history can be messy and maybe there were several
justifications of which Josef remembers one and Chris reported
another.

If rsync did, in fact, wander into subvols and didn't get put off by the
duplicate inode numbers (like 'find' does), then it would still do that
when accessing btrfs over NFS.  This has always been the case.  Chris'
"fix" only affected local access, it didn't change NFS access at all.

> 
> IOW, the user experience you are trying to solve is inability of 'find'
> to traverse the unified btrfs namespace, but Josef's comment indicates
> that some users were explicitly unhappy from 'find' trying to traverse
> into subvolumes to begin with.

I believe that even 12 years ago, find would have complained if it saw a
directory with the same inode as an ancestor.  Chris's fix wouldn't
prevent find from entering in that case, because it wouldn't enter
anyway.

> 
> So is there really a globally expected user experience?

No.  Everybody wants what they want.  There is some overlap, not no
guarantees.  That is the unavoidable consequence of ignoring standards
when implementing functionality.

> If not, then I really don't see how an nfs export option can be avoided.

And I really don't see how an nfs export option would help...  Different
people within and organisation and using the same export might have
different expectations.

Thanks,
NeilBrown

> 
> Thanks,
> Amir.
> 
> [1] https://www.linuxplumbersconf.org/event/11/page/104-accepted-microconferences#cont-filesys
> [2] https://lwn.net/Articles/867509/
> 
>