Re: file handle in statx

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Tue, 12 Dec 2023 19:00:15 -0500

On Wed, Dec 13, 2023 at 10:44:07AM +1100, Dave Chinner wrote:
> On Tue, Dec 12, 2023 at 05:39:27PM -0500, Kent Overstreet wrote:
> > Like Neal mentioned we won't even be fetching the fh if it wasn't
> > explicitly requested - and like I mentioned, we can avoid the
> > .encode_fh() call for local filesystems with a bit of work at the VFS
> > layer.
> > 
> > OTOH, when you're running rsync in incremental mode, and detecting
> > hardlinks, your point that "statx can be called millions of times per
> > second" would apply just as much to the additional name_to_handle_at()
> > call - we'd be nearly doubling their overhead for scanning files that
> > don't need to be sent.
> 
> Hardlinked files are indicated by st_nlink > 1, not by requiring
> userspace to store every st_ino/dev it sees and having to compare
> the st-ino/dev of every newly stat()d inode against that ino/dev
> cache.
> 
> We only need ino/dev/filehandles for hardlink path disambiguation.
> 
> IOWs, this use case does not need name_to_handle_at() for millions
> of inodes - it is just needed on the regular file inodes that have
> st_nlink > 1.

Ok yeah, that's a really good point. Perhaps nanme_to_handle_at() is
sufficient, then.

If so, maybe we can just add STATX_ATTR_INUM_NOT_UNIQUE and STATX_VOL
now, and leave STATX_HANDLE until someone discovers an application where
it actually does matter.

> > > And then comes the cost of encoding dynamically sized information in
> > > struct statx - filehandles are not fixed size - and statx is most
> > > definitely not set up or intended for dynamically sized attribute
> > > data. This adds more complexity to statx because it wasn't designed
> > > or intended to handle dynamically sized attributes. Optional
> > > attributes, yes, but not attributes that might vary in size from fs
> > > to fs or even inode type to inode type within a fileystem (e.g. dir
> > > filehandles can, optionally, encode the parent inode in them).
> > 
> > Since it looks like expanding statx is not going to be quite as easy as
> > hoped, I proposed elsewhere in the thread that we reserve a smaller
> > fixed size in statx (32 bytes) and set a flag if it won't fit,
> > indicating that userspace needs to fall back to name_to_handle_at().
> 
> struct btrfs_fid is 40 bytes in size. Sure, that's not all used for
> name_to_handle_at(), but we already have in-kernel filehandles that
> can optionally configured to be bigger than 32 bytes...

The hell is all that for!? They never reuse inode numbers, why are there
generation numbers in there? And do they not have inode -> dirent
backrefs?

> > Stuffing a _dynamically_ sized attribute into statx would indeed be
> > painful - I believe were always talking about a fixed size buffer in
> > statx, the discussion's been over how big it needs to be...
> 
> The contents of the buffer is still dynamically sized, so there's
> still a length attribute that needs to be emitted to userspace with
> the buffer.

Correct

> And then what happens with the next attribute that someone wants
> statx() to expose that can be dynamically sized? Are we really
> planning to allow the struct statx to be expanded indefinitely
> with largely unused static data arrays?

Well, struct stat/statx is not a long lived object that anyone would
ever keep a lot of around; it's a short lived object that just needs to
be efficient to access and ABI stable, so yes, if this comes up again
that's what we should do.

The alternative would be adding fields with an [ offset, length ] scheme
and treating the statx buffer as a bump allocator, but simple and fast
to access beats space efficiency here...