On Tue, Aug 6, 2024 at 6:09 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > On Tue, Aug 6, 2024 at 5:53 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > On Tue, Aug 06, 2024 at 04:46:28PM +0200, Mateusz Guzik wrote: > > > > > The flag thing is optional and can be dropped, but I think the general > > > direction should be to add *more* asserts and whatnot (even if they are > > > to land separately). A debug-only variant would not hurt. > > > > Asserts do *not* clarify anything; if you want your optional flag, > > come up with clear description of its semantics. In terms of state, > > _not_ control flow. > > > > It is supposed to indicate that both nd->path.mnt and nd->path.dentry > are no longer usable and must not even be looked at. Ideally code > which *does* look at them despite the flag (== there is a bug) traps. > > However, I did not find a handy macro or anything of the sort to > "poison" these pointers. Instead I found tons of NULL checks all over, > including in lookup clean up. > > So as is I agree the flag adds next to nothing as is, but the intent > was to catch any use of the above pointers after what they point to > got consumed. Perhaps I should just drop the flag for the time being > and only propose it with a more fleshed out scheme later. > > > > @@ -3683,6 +3685,7 @@ static const char *open_last_lookups(struct nameidata *nd, > > > static int do_open(struct nameidata *nd, > > > struct file *file, const struct open_flags *op) > > > { > > > + struct vfsmount *mnt; > > > struct mnt_idmap *idmap; > > > int open_flag = op->open_flag; > > > bool do_truncate; > > > @@ -3720,11 +3723,22 @@ static int do_open(struct nameidata *nd, > > > error = mnt_want_write(nd->path.mnt); > > > if (error) > > > return error; > > > + /* > > > + * We grab an additional reference here because vfs_open_consume() > > > + * may error out and free the mount from under us, while we need > > > + * to undo write access below. > > > + */ > > > + mnt = mntget(nd->path.mnt); > > > > It's "after vfs_open_consume() we no longer own the reference in nd->path.mnt", > > error or no error... > > > > ok > > > > do_truncate = true; > > > > > > > } > > > error = may_open(idmap, &nd->path, acc_mode, open_flag); > > > - if (!error && !(file->f_mode & FMODE_OPENED)) > > > - error = vfs_open(&nd->path, file); > > > + if (!error && !(file->f_mode & FMODE_OPENED)) { > > > + BUG_ON(nd->state & ND_PATH_CONSUMED); > > > + error = vfs_open_consume(&nd->path, file); > > > + nd->state |= ND_PATH_CONSUMED; > > > + nd->path.mnt = NULL; > > > + nd->path.dentry = NULL; > > > > Umm... The thing that feels wrong here is that you get an extra > > asymmetry with ->atomic_open() ;-/ We obviously can't do that > > kind of thing there (if nothing else, we have the parent directory's > > inode to unlock, error or no error). > > > > I don't hate that patch, but it really feels like the calling > > conventions are not right. Let me try to tweak it a bit and > > see if anything falls out... > fwiw if you are looking to have vfs_open_${keyword} which accepts nameidata and "consumes" it, I did not go for it because vfs_open is in another file and I did not want to "leak" the struct there. > Should you write your own patch I'm happy to give it a spin to > validate the win is about the same. > > I forgot to note some stuff when sending the patch, so here it is: > perf is highly unstable between re-runs of the benchmark because > struct inode itself is not aligned to anything and at least ext4 plops > it in in a rather arbitrary place and uses a kmem cache without any > magic alignment guarantees. This results in false sharing showing up > and disappearing depending on how (un)lucky one gets. For reported > results I picked the worst case. > > I had to patch one sucker for constantly getting in: > https://lore.kernel.org/linux-security-module/20240806133607.869394-1-mjguzik@xxxxxxxxx/T/#u > > -- > Mateusz Guzik <mjguzik gmail.com> -- Mateusz Guzik <mjguzik gmail.com>