Re: [RFC PATCH v1 1/7] fs: Add inode_get_ino() and implement get_ino() for NFS

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 21 Oct 2024 15:13:20 +0200

On Fri, Oct 18, 2024 at 02:25:43PM +0200, Jan Kara wrote:
> On Thu 17-10-24 16:21:34, Paul Moore wrote:
> > On Thu, Oct 17, 2024 at 1:05 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > On Thu, 2024-10-17 at 11:15 -0400, Paul Moore wrote:
> > > > On Thu, Oct 17, 2024 at 10:58 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > > > > On Thu, Oct 17, 2024 at 10:54:12AM -0400, Paul Moore wrote:
> > > > > > Okay, good to know, but I was hoping that there we could come up with
> > > > > > an explicit list of filesystems that maintain their own private inode
> > > > > > numbers outside of inode-i_ino.
> > > > >
> > > > > Anything using iget5_locked is a good start.  Add to that file systems
> > > > > implementing their own inode cache (at least xfs and bcachefs).
> > > >
> > > > Also good to know, thanks.  However, at this point the lack of a clear
> > > > answer is making me wonder a bit more about inode numbers in the view
> > > > of VFS developers; do you folks care about inode numbers?  I'm not
> > > > asking to start an argument, it's a genuine question so I can get a
> > > > better understanding about the durability and sustainability of
> > > > inode->i_no.  If all of you (the VFS folks) aren't concerned about
> > > > inode numbers, I suspect we are going to have similar issues in the
> > > > future and we (the LSM folks) likely need to move away from reporting
> > > > inode numbers as they aren't reliably maintained by the VFS layer.
> > > >
> > >
> > > Like Christoph said, the kernel doesn't care much about inode numbers.
> > >
> > > People care about them though, and sometimes we have things in the
> > > kernel that report them in some fashion (tracepoints, procfiles, audit
> > > events, etc.). Having those match what the userland stat() st_ino field
> > > tells you is ideal, and for the most part that's the way it works.
> > >
> > > The main exception is when people use 32-bit interfaces (somewhat rare
> > > these days), or they have a 32-bit kernel with a filesystem that has a
> > > 64-bit inode number space (NFS being one of those). The NFS client has
> > > basically hacked around this for years by tracking its own fileid field
> > > in its inode.
> > 
> > When I asked if the VFS dev cared about inode numbers this is more of
> > what I was wondering about.  Regardless of if the kernel itself uses
> > inode numbers for anything, it does appear that users do care about
> > inode numbers to some extent, and I wanted to know if the VFS devs
> > viewed the inode numbers as a first order UAPI interface/thing, or if
> > it was of lesser importance and not something the kernel was going to
> > provide much of a guarantee around.  Once again, I'm not asking this
> > to start a war, I'm just trying to get some perspective from the VFS
> > dev side of things.
> 
> Well, we do care to not break our users. So our opinion about "first order
> UAPI" doesn't matter that much. If userspace is using it, we have to
> avoid breaking it. And there definitely is userspace depending on st_ino +
> st_dev being unique identifier of a file / directory so we want to maintain
> that as much as possible (at least as long as there's userspace depending
> on it which I don't see changing in the near future).
> 
> That being said historically people have learned NFS has its quirks,
> similarly as btrfs needing occasionally a special treatment and adapted to
> it, bcachefs is new enough that userspace didn't notice yet, that's going
> to be interesting.
> 
> There's another aspect that even 64-bits start to be expensive to pack
> things into for some filesystems (either due to external protocol
> constraints such as for AFS or due to the combination of features such as
> subvolumes, snapshotting, etc.). Going to 128-bits for everybody seems
> like a waste so at last LSF summit we've discussed about starting to push
> file handles (output of name_to_handle_at(2)) as a replacement of st_ino
> for file/dir identifier in a filesystem. For the kernel this would be
> convenient because each filesystem can pack there what it needs. But
> userspace guys were not thrilled by this (mainly due to the complexities of
> dynamically sized identifier and passing it around). So this transition
> isn't currently getting much traction and we'll see how things evolve.

It's also not an answer for every filesystem. For example, you don't
want to use file handles for pidfds when you are guaranteed that the
inode numbers will be unique. So file handles will not be used for that
where a simple statx() and comparing inode numbers can do.