Re: file handle in statx (was: Re: How to cope with subvolumes and snapshots on muti-user systems?)

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Tue, 12 Dec 2023 10:28:53 -0500

On Tue, Dec 12, 2023 at 10:35:40AM +0100, Christian Brauner wrote:
> On Tue, Dec 12, 2023 at 10:28:12AM +0100, Miklos Szeredi wrote:
> > On Tue, 12 Dec 2023 at 10:23, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Dec 12, 2023 at 09:10:47AM +0000, David Howells wrote:
> > > > Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > > > > > I suggest:
> > > > > > > >
> > > > > > > >  STATX_ATTR_INUM_NOT_UNIQUE - it is possible that two files have the
> > > > > > > >                               same inode number
> > > > >
> > > > > This is just ugly with questionable value. A constant reminder of how
> > > > > broken this is. Exposing the subvolume id also makes this somewhat redundant.
> > > >
> > > > There is a upcoming potential problem where even the 64-bit field I placed in
> > > > statx() may be insufficient.  The Auristor AFS server, for example, has a
> > > > 96-bit vnode ID, but I can't properly represent this in stx_ino.  Currently, I
> > >
> > > Is that vnode ID akin to a volume? Because if so you could just
> > > piggy-back on a subvolume id field in statx() and expose it there.
> > 
> > And how would exporting a subvolume ID and expecting userspace to take
> > that into account when checking for hard links be meaningfully
> > different than expecting userspace to retrieve the file handle and
> > compare that?
> > 
> > The latter would at least be a generic solution, including stacking fs
> > inodes, instead of tackling just a specific corner of the problem
> > space.
> 
> So taking a step back here, please. The original motivation for this
> discussion was restricted to handle btrfs - and now bcachefs as well.
> Both have a concept of a subvolume so it made sense to go that route.
> IOW, it wasn't originally a generic problem or pitched as such.
> 
> Would overlayfs be able to utilize an extended inode field as well?

No, the original motivation was not just btrfs and bcachefs; overlayfs
fundamentally needs to export a bigger identifier than the host
filesystems - pigeonhole principle, if anyone remembers their
combinatorics.

This applies to any filesystem which is under the hood reexporting from
other filesystems; there's a lot of stuff going on in container
filesystems right now, and I expect it'll come up there (you can create
new identifiers if you're exporting file by file, but not if it's
directory trees).

And Neal brought up NFS re-exports; I think there's a compelling
argument to be made that we ought to be able to round trip NFS
filehandles.