Re: file handle in statx

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 13 Dec 2023 10:44:07 +1100

On Tue, Dec 12, 2023 at 05:39:27PM -0500, Kent Overstreet wrote:
> On Wed, Dec 13, 2023 at 09:23:18AM +1100, Dave Chinner wrote:
> > On Wed, Dec 13, 2023 at 08:57:43AM +1100, NeilBrown wrote:
> > > On Wed, 13 Dec 2023, Dave Chinner wrote:
> > > > On Tue, Dec 12, 2023 at 09:15:29AM -0800, Frank Filz wrote:
> > > > > > On Tue, Dec 12, 2023 at 10:10:23AM +0100, Donald Buczek wrote:
> > > > > > > On 12/12/23 06:53, Dave Chinner wrote:
> > > > > > >
> > > > > > > > So can someone please explain to me why we need to try to re-invent
> > > > > > > > a generic filehandle concept in statx when we already have a have
> > > > > > > > working and widely supported user API that provides exactly this
> > > > > > > > functionality?
> > > > > > >
> > > > > > > name_to_handle_at() is fine, but userspace could profit from being
> > > > > > > able to retrieve the filehandle together with the other metadata in a
> > > > > > > single system call.
> > > > > > 
> > > > > > Can you say more?  What, specifically is the application that would want
> > > > > to do
> > > > > > that, and is it really in such a hot path that it would be a user-visible
> > > > > > improveable, let aloine something that can be actually be measured?
> > > > > 
> > > > > A user space NFS server like Ganesha could benefit from getting attributes
> > > > > and file handle in a single system call.
> > > > 
> > > > At the cost of every other application that doesn't need those
> > > > attributes.
> > > 
> > > Why do you think there would be a cost?
> > 
> > It's as much maintenance and testing cost as it is a runtime cost.
> > We have to test and check this functionality works as advertised,
> > and we have to maintain that in working order forever more. That's
> > not free, especially if it is decided that the implementation needs
> > to be hyper-optimised in each individual filesystem because of
> > performance cost reasons.
> > 
> > Indeed, even the runtime "do we need to fetch this information"
> > checks have a measurable cost, especially as statx() is a very hot
> > kernel path. We've been optimising branches out of things like
> > setting up kiocbs because when that path is taken millions of times
> > every second each logic branch that decides if something needs to be
> > done or not has a direct measurable cost. statx() is a hot path that
> > can be called millions of times a second.....
> 
> Like Neal mentioned we won't even be fetching the fh if it wasn't
> explicitly requested - and like I mentioned, we can avoid the
> .encode_fh() call for local filesystems with a bit of work at the VFS
> layer.
> 
> OTOH, when you're running rsync in incremental mode, and detecting
> hardlinks, your point that "statx can be called millions of times per
> second" would apply just as much to the additional name_to_handle_at()
> call - we'd be nearly doubling their overhead for scanning files that
> don't need to be sent.

Hardlinked files are indicated by st_nlink > 1, not by requiring
userspace to store every st_ino/dev it sees and having to compare
the st-ino/dev of every newly stat()d inode against that ino/dev
cache.

We only need ino/dev/filehandles for hardlink path disambiguation.

IOWs, this use case does not need name_to_handle_at() for millions
of inodes - it is just needed on the regular file inodes that have
st_nlink > 1.

Hence even for wrokloads like rsync with hardlink detection, we
don't need filehandles for every inode being stat()d.  And that's
ignoring the fact that, outside of certain niche use cases,
hardlinks are rare.

I'm really struggling to see what filehandles in statx() actually
optimises in any meaningful manner....

> > And then comes the cost of encoding dynamically sized information in
> > struct statx - filehandles are not fixed size - and statx is most
> > definitely not set up or intended for dynamically sized attribute
> > data. This adds more complexity to statx because it wasn't designed
> > or intended to handle dynamically sized attributes. Optional
> > attributes, yes, but not attributes that might vary in size from fs
> > to fs or even inode type to inode type within a fileystem (e.g. dir
> > filehandles can, optionally, encode the parent inode in them).
> 
> Since it looks like expanding statx is not going to be quite as easy as
> hoped, I proposed elsewhere in the thread that we reserve a smaller
> fixed size in statx (32 bytes) and set a flag if it won't fit,
> indicating that userspace needs to fall back to name_to_handle_at().

struct btrfs_fid is 40 bytes in size. Sure, that's not all used for
name_to_handle_at(), but we already have in-kernel filehandles that
can optionally configured to be bigger than 32 bytes...

> Stuffing a _dynamically_ sized attribute into statx would indeed be
> painful - I believe were always talking about a fixed size buffer in
> statx, the discussion's been over how big it needs to be...

The contents of the buffer is still dynamically sized, so there's
still a length attribute that needs to be emitted to userspace with
the buffer.

And then what happens with the next attribute that someone wants
statx() to expose that can be dynamically sized? Are we really
planning to allow the struct statx to be expanded indefinitely
with largely unused static data arrays?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx