On Tue, 2 Jul 2024 at 10:58, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > Suppose the rcu fast path lookup reads the dentry seqc, then does all > the legitimize_mnt and other work. Everything, except modifying the > lockref. The caller is given a mnt to put (per-cpu scalable), dentry > seqc read before any of the path validation and an indication this is > rcu. Yes. > Then after whatever is done if the seqc still matches this is the same > as if there was lockref get/put around it. So this is partly why I was thinking of a callback. That "check sequence number afterwards" is still important. And if it's a callback, it can be done in the path walking code, and it can go on and say "oh, I'll need to redo this without RCU". If it's a "we returned a dentry under RCU", suddenly the caller has to know this about the name lookup and do the repeating by hand. And as long as we don't expose it to modules and only use it for "stat()" and friends, I'm ok with it, but I'm just saying that it's all a bit scary. > The only worry is pointers suddenly going NULL or similar as > dentry/inode is looked at. To be worked out on per-syscall basis. We have subtle rules wrt dentry->d_inode. It can indeed become NULL at any time during the RCU walk, since what protects it is the d_lock and the dentry count. The inode itself is then RCU-free'd, so it will *exist*, but you can't just blindly use dentry->d_inode itself while under RCU. Which is why it's cached in 'struct nameidata', and we validate it with nd->seq when it's loaded. And why things like may_lookup() use nd->inode, not the dentry. And that's another rule that we probably should aim to not have escape from the path walking as an interface. Because it's much too easy to do struct inode *inode = d_backing_inode(path->dentry); but that's just wrong during the RCU path walk. Again, having this be a callback during the walk would avoid issues like this. The callback can just pass in the separate inode pointer. And then a sequence point failure will return -ECHILD and do the walk again, while a callback success with all the sequence numbers matching would return -ECALLBACK or whatever, so that the caller would know "the stat information was already successfully completed by the callback". Anyway, that was my handwavy "this is why I was thinking of a callback" thing. But it's also an example of just how nasty and subtle this all is. But I'm convinced this is all eminently *solvable*. There's nothing fundamental here. Just a lot of small nasty details. Linus