On Wed, Dec 13, 2023 at 09:23:18AM +1100, Dave Chinner wrote: > On Wed, Dec 13, 2023 at 08:57:43AM +1100, NeilBrown wrote: > > On Wed, 13 Dec 2023, Dave Chinner wrote: > > > On Tue, Dec 12, 2023 at 09:15:29AM -0800, Frank Filz wrote: > > > > > On Tue, Dec 12, 2023 at 10:10:23AM +0100, Donald Buczek wrote: > > > > > > On 12/12/23 06:53, Dave Chinner wrote: > > > > > > > > > > > > > So can someone please explain to me why we need to try to re-invent > > > > > > > a generic filehandle concept in statx when we already have a have > > > > > > > working and widely supported user API that provides exactly this > > > > > > > functionality? > > > > > > > > > > > > name_to_handle_at() is fine, but userspace could profit from being > > > > > > able to retrieve the filehandle together with the other metadata in a > > > > > > single system call. > > > > > > > > > > Can you say more? What, specifically is the application that would want > > > > to do > > > > > that, and is it really in such a hot path that it would be a user-visible > > > > > improveable, let aloine something that can be actually be measured? > > > > > > > > A user space NFS server like Ganesha could benefit from getting attributes > > > > and file handle in a single system call. > > > > > > At the cost of every other application that doesn't need those > > > attributes. > > > > Why do you think there would be a cost? > > It's as much maintenance and testing cost as it is a runtime cost. > We have to test and check this functionality works as advertised, > and we have to maintain that in working order forever more. That's > not free, especially if it is decided that the implementation needs > to be hyper-optimised in each individual filesystem because of > performance cost reasons. > > Indeed, even the runtime "do we need to fetch this information" > checks have a measurable cost, especially as statx() is a very hot > kernel path. We've been optimising branches out of things like > setting up kiocbs because when that path is taken millions of times > every second each logic branch that decides if something needs to be > done or not has a direct measurable cost. statx() is a hot path that > can be called millions of times a second..... Like Neal mentioned we won't even be fetching the fh if it wasn't explicitly requested - and like I mentioned, we can avoid the .encode_fh() call for local filesystems with a bit of work at the VFS layer. OTOH, when you're running rsync in incremental mode, and detecting hardlinks, your point that "statx can be called millions of times per second" would apply just as much to the additional name_to_handle_at() call - we'd be nearly doubling their overhead for scanning files that don't need to be sent. > And then comes the cost of encoding dynamically sized information in > struct statx - filehandles are not fixed size - and statx is most > definitely not set up or intended for dynamically sized attribute > data. This adds more complexity to statx because it wasn't designed > or intended to handle dynamically sized attributes. Optional > attributes, yes, but not attributes that might vary in size from fs > to fs or even inode type to inode type within a fileystem (e.g. dir > filehandles can, optionally, encode the parent inode in them). Since it looks like expanding statx is not going to be quite as easy as hoped, I proposed elsewhere in the thread that we reserve a smaller fixed size in statx (32 bytes) and set a flag if it won't fit, indicating that userspace needs to fall back to name_to_handle_at(). Stuffing a _dynamically_ sized attribute into statx would indeed be painful - I believe were always talking about a fixed size buffer in statx, the discussion's been over how big it needs to be...