On Tue, Jul 2, 2024 at 6:47 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 2 Jul 2024 at 05:10, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > > > Well there is also the option of going full RCU in the fast path, which > > I mentioned last time around lockref. > > > > This would be applicable at least for stat, fstatx, readlink and access > > syscalls. > > Yes. That would be the optimal thing - have some "don't take a lockref > on the last component at all, because we will finish the use of it > under RCU". > > I looked at that some time ago, and it didn't look _horrendous_ from a > conceptual standpoint, but the details just got to be nasty. > > What I wanted to do was to hook into the "we're still in RCU mode" > with a callback that stat could set. > > And we'd call it at complete_walk() -> try_to_unlazy() -> > legitimize_path() time just before we do that lockref_get_not_dead() > thing. > > So then the path walkers that are ok with RCU state (ie mostly just > 'stat()' and friends) could set that callback, and get a callback > while the path walk is still in RCU mode, and could fill in the stat > data then and say "I'm done" and we'd never actually finalize the path > at all, and never do the final lockref_get_not_dead(). > > Sounds simple in theory. And then when I looked at doing the actual > code patch, I ended up just running away scared. I was thinking a different approach. A lookup variant which resolves everything and returns the dentry + an information whether this is rcu mode. if not the regular handling + path_put sort it out. If yes then the fast path handling gets involved. If a filesystem can provide a custom callback for the regular usage above, there would be an optional callback for rcu mode as well (and it would be illegal to only have one). Should this run into any trouble it can return -AGAIN at which point try_to_get_actual_full_ref() (but better named) routine is called and it tries to get the actual ref. Suppose the callback or in-place handling worked out. Then a routine to validate nothing changed (at least dentry seq?) is called. Should it succeed that's it, otherwise the entire thing redos the work the old fashioned way. I have not looked to closely yet but I think this is very much doable without much swearing, I am going to look into it after I find some time, maybe this weekend. Regardless of the above I think decoupling actual dentry ref from the d_lock is a valuable step anyway, I am going to take a stab at that too. Most of the work is kind of already done with the 1->0 transition already handled. Just need to replace non-atomic updates with atomics and cmpcxhg with a flag to whack new additions. All that aside, the lockref patch reported here needs to get dropped from the tree and I don't think a lockref-specific replacement is viable. -- Mateusz Guzik <mjguzik gmail.com>