On Tue, Aug 6, 2024 at 5:53 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Aug 06, 2024 at 04:46:28PM +0200, Mateusz Guzik wrote: > > > The flag thing is optional and can be dropped, but I think the general > > direction should be to add *more* asserts and whatnot (even if they are > > to land separately). A debug-only variant would not hurt. > > Asserts do *not* clarify anything; if you want your optional flag, > come up with clear description of its semantics. In terms of state, > _not_ control flow. > It is supposed to indicate that both nd->path.mnt and nd->path.dentry are no longer usable and must not even be looked at. Ideally code which *does* look at them despite the flag (== there is a bug) traps. However, I did not find a handy macro or anything of the sort to "poison" these pointers. Instead I found tons of NULL checks all over, including in lookup clean up. So as is I agree the flag adds next to nothing as is, but the intent was to catch any use of the above pointers after what they point to got consumed. Perhaps I should just drop the flag for the time being and only propose it with a more fleshed out scheme later. > > @@ -3683,6 +3685,7 @@ static const char *open_last_lookups(struct nameidata *nd, > > static int do_open(struct nameidata *nd, > > struct file *file, const struct open_flags *op) > > { > > + struct vfsmount *mnt; > > struct mnt_idmap *idmap; > > int open_flag = op->open_flag; > > bool do_truncate; > > @@ -3720,11 +3723,22 @@ static int do_open(struct nameidata *nd, > > error = mnt_want_write(nd->path.mnt); > > if (error) > > return error; > > + /* > > + * We grab an additional reference here because vfs_open_consume() > > + * may error out and free the mount from under us, while we need > > + * to undo write access below. > > + */ > > + mnt = mntget(nd->path.mnt); > > It's "after vfs_open_consume() we no longer own the reference in nd->path.mnt", > error or no error... > ok > > do_truncate = true; > > > > } > > error = may_open(idmap, &nd->path, acc_mode, open_flag); > > - if (!error && !(file->f_mode & FMODE_OPENED)) > > - error = vfs_open(&nd->path, file); > > + if (!error && !(file->f_mode & FMODE_OPENED)) { > > + BUG_ON(nd->state & ND_PATH_CONSUMED); > > + error = vfs_open_consume(&nd->path, file); > > + nd->state |= ND_PATH_CONSUMED; > > + nd->path.mnt = NULL; > > + nd->path.dentry = NULL; > > Umm... The thing that feels wrong here is that you get an extra > asymmetry with ->atomic_open() ;-/ We obviously can't do that > kind of thing there (if nothing else, we have the parent directory's > inode to unlock, error or no error). > > I don't hate that patch, but it really feels like the calling > conventions are not right. Let me try to tweak it a bit and > see if anything falls out... Should you write your own patch I'm happy to give it a spin to validate the win is about the same. I forgot to note some stuff when sending the patch, so here it is: perf is highly unstable between re-runs of the benchmark because struct inode itself is not aligned to anything and at least ext4 plops it in in a rather arbitrary place and uses a kmem cache without any magic alignment guarantees. This results in false sharing showing up and disappearing depending on how (un)lucky one gets. For reported results I picked the worst case. I had to patch one sucker for constantly getting in: https://lore.kernel.org/linux-security-module/20240806133607.869394-1-mjguzik@xxxxxxxxx/T/#u -- Mateusz Guzik <mjguzik gmail.com>