On Wed, 26 Feb 2025, Trond Myklebust wrote: > On Wed, 2025-02-26 at 13:09 +1100, NeilBrown wrote: > > On Tue, 25 Feb 2025, Trond Myklebust wrote: > > > On Mon, 2025-02-24 at 14:09 +1100, NeilBrown wrote: > > > > On Mon, 24 Feb 2025, Al Viro wrote: > > > > > On Mon, Feb 24, 2025 at 12:34:06PM +1100, NeilBrown wrote: > > > > > > On Sat, 22 Feb 2025, Al Viro wrote: > > > > > > > On Fri, Feb 21, 2025 at 10:36:30AM +1100, NeilBrown wrote: > > > > > > > > > > > > > > > +In general, filesystems which use d_instantiate_new() to > > > > > > > > install the new > > > > > > > > +inode can safely return NULL. Filesystems which may not > > > > > > > > have an I_NEW inode > > > > > > > > +should use d_drop();d_splice_alias() and return the > > > > > > > > result > > > > > > > > of the latter. > > > > > > > > > > > > > > IMO that's a bad pattern, _especially_ if you want to go > > > > > > > for > > > > > > > "in-update" > > > > > > > kind of stuff later. > > > > > > > > > > > > Agreed. I have a draft patch to change d_splice_alias() and > > > > > > d_exact_alias() to work on hashed dentrys. I thought it > > > > > > should > > > > > > go after > > > > > > these mkdir patches rather than before. > > > > > > > > > > Could you give a braindump on the things d_exact_alias() is > > > > > needed > > > > > for? > > > > > It's a recurring headache when doing ->d_name/->d_parent > > > > > audits; > > > > > see e.g. > > > > > https://lore.kernel.org/all/20241213080023.GI3387508@ZenIV/ for > > > > > related > > > > > mini-rant from the latest iteration. > > > > > > > > > > Proof of correctness is bloody awful; it feels like the > > > > > primitive > > > > > itself > > > > > is wrong, but I'd never been able to write anything concise > > > > > regarding > > > > > the things we really want there ;-/ > > > > > > > > > > > > > As I understand it, it is needed (or wanted) to handle the > > > > possibility > > > > of an inode becoming "stale" and then recovering. This could > > > > happen, > > > > for example, with a temporarily misconfigured NFS server. > > > > > > > > If ->d_revalidate gets a NFSERR_STALE from the server it will > > > > return > > > > '0' > > > > so lookup_fast() and others will call d_invalidate() which will > > > > d_drop() > > > > the dentry. There are other paths on which -ESTALE can result in > > > > d_drop(). > > > > > > > > If a subsequent attempt to "open" the name successfully finds the > > > > same > > > > inode we want to reuse the old dentry rather than create a new > > > > one. > > > > > > > > I don't really understand why. This code was added 20 years ago > > > > before > > > > git. > > > > It was introduced by > > > > > > > > commit 89a45174b6b32596ea98fa3f89a243e2c1188a01 > > > > Author: Trond Myklebust <trond.myklebust@xxxxxxxxxx> > > > > Date: Tue Jan 4 21:41:37 2005 +0100 > > > > > > > > VFS: Avoid dentry aliasing problems in filesystems like NFS, > > > > where > > > > inodes may be marked as stale in one instance (causing > > > > the > > > > dentry > > > > to be dropped) then re-enabled in the next instance. > > > > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxx> > > > > > > > > in history.git > > > > > > > > Trond: do you have any memory of this? Can you explain what the > > > > symptom > > > > was that you wanted to fix? > > > > > > > > The original patch used d_add_unique() for lookup and atomic_open > > > > and > > > > readdir prime-dcache. We now only use it for v4 atomic_open. > > > > Maybe > > > > we > > > > don't need it at all? Or maybe we need to restore it to those > > > > other > > > > callers? > > > > > > > > > > 2005? That looks like it was trying to deal with the userspace NFS > > > server. I can't remember when it was given the ability to use the > > > inode > > > generation counter, but I'm pretty sure that in 2005 there were > > > plenty > > > of setups out there that had the older version that reused > > > filehandles > > > (due to inode number reuse). So you would get spurious ESTALE > > > errors > > > sometimes due to inode number reuse, sometimes because the > > > filehandle > > > fell out of the userspace NFS server's cache. > > > > So this was likely done to work-around known weaknesses in NFS > > servers > > at the time. > > > > The original d_add_unique() was used in nfs_lookup() > > nfs_atomic_lookup() > > and nfs_readdir_lookup() but the current descendent d_exact_alias() > > is > > only used in _nfs4_open_and_get_state() called only by nfs4_do_open() > > which is only used in nfs4_atomic_open() and nfs4_proc_create(). > > > > So the usage in 'lookup' and 'readdir' have fallen by the wayside > > with > > no apparent negative consequences. > > The old NFS servers have probably been fixed. > > > > So do you have any concerns with us discarding d_exact_alias() and > > only > > using d_splice_alias() in _nfs4_open_get_state() ?? > > > > AFAIK, there were never any NFSv4 servers in public use that mimicked > the quirks of the userspace NFSv2/NFSv3 server. So I'm thinking it > should be safe to retire d_exact_alias. Thanks. I'll submit a patch through the VFS tree as I have other VFS patches in the works that will depend on that so having them together would be good. Thanks, NeilBrown