On Tue, 2 Jul 2024 at 14:15, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > I asked you something in the previous e-mail though (with some nastiness > of the problem pointed out) concerning handling of slow vs fastpath, > here it is again: > > [..]for example did you know xfs does not honor rcu grace periods when > recycling inodes? I don't think it should matter. Yes, the vfs layer will access the inode, but then the sequence number checks should always verify that whatever access we did is then validated after-the-fact. And yes, any "stat under RCU" will obviously have to do the sequence number check after having copied the data, the same way we now do it after the "lockref_get_not_dead()". So the RCU part just means that at least the allocation still exists. But it might not have been the *right* inode, and that's the sequence number check. Or do you see something I'm missing? I'm not loving how we deal with dentry->d_inode, but considering that the whole point of the RCU lookup is that we don't hold any locks, I think it's almost unavoidable. > Sounds like you are deadset on the callback approach. Hey, I happen to think that it's the easiest model, but you can easily prove me wrong. Almost all of my worries can probably be handled by just having a separate "do this after the operation" helper function. Also, maybe the "callback" model is the wrong one, and instead the right one is to just say "the only thing that wants this is the 'stat' family", and simply have a "path_statx()" function that is all inside fs/namei.c, and that doesn't take any callbacks, but instead just fills in the 'struct kstat'. But if you have better ideas, go wild. The only thing I really *don't* want is some kind of expansion on the new "filename_lookup()" function that takes a new flag, and that then random people can start using, and that is so fragile that everybody will get it wrong. Your 'vfs_rcu_magic_lookup()' looks like it would work, and we'd just have to make the rule that it's never exported to modules or anything like that, simply because I don't trust random drivers or vendor modules to get it right. > Can you pseudo-code how would the consumer look like in your case? Do > you want the callback to execute for both slow and fastpath and switch > on the flag? It is rather unclear what you are proposing here. So what my half-aresed thinking was on how I'd do it is roughly: - add a new "rcu_callback" function pointer to ' struct nameidata' (or, if deciding to just fill in 'struct kstat', just the pointer to that). - in complete_walk(), where we currently do that if (!try_to_unlazy(nd)) return -ECHILD; instead do something like this: if (nd->rcu_callback) { int ret = nd->rcu_callback(nd); if (ret) { if (!validate_sequence_counts(nd)) ret = -ECHILD; nd->path.mnt = NULL; nd->path.dentry = NULL; leave_rcu(nd); return ret; } } if (!try_to_unlazy(nd)) return -ECHILD; - add a new version of filename_lookup() that takes that enw callback, and just sets in in nameidata before it does that retval = path_lookupat(&nd, flags | LOOKUP_RCU, path); thing. IOW, the point is that the callback can decide that it has gathered enough information, and to basically "abort" the RCU path walk by returning an error. And if it does *not* return an error to abort (or if we never even get there, because something else caused us to exit RCU mode), we act exactly like we used to, and nothing has changed. But if it *does* return an error, we stop walking, never take the refs on the path, and we validate the sequence count. Again, if the sequence count validation failed, we turn the error into ECHILD, so that the path walking just gets re-done without RCU. So a sequence point failure means that we did end up calling the callback function, but we basically discard the result. End result: the path walking will return that new error only if the callback (a) existed, (b) returned that error and (c) the sequence counts still validated properly afterwards. And if for some reason it *cannot* fill in stat data (say, because the filesystem is one that doesn't support "stat under RCU" or whatever), the callback returns zero, and we do all the things we used to do. And the *user* of this would basically just put in a "stat_rcu()" callback that fills in the stat data and returns some error code that path walking can't return, which then informs the caller that the stat data has already been filled in and can be used as-is. So now the caller can look something like this: ret = filename_lookup_with_callback(... callback); if (ret == -ECALLBACK) { // this means that the stat info has already been filled in, copy it to user mode copy_stat_to_user(..); return 0; } // "Real" error if (ret) return ret; // Ok, we have a path, we do what we used to do before error = vfs_getattr(&path, stat, request_mask, flags); ... path_put(&path); Anyway, that was my thinking. It *feels* fairly minimal. But no, I never did the patch, and I might not have thought of some complication, and it is all pretty messy, and no, I do not want to force you into this kind of model if you hate it. I don't think your vfs_rcu_magic_lookup() function is all that fundamentally different, but the advantage of the callback model is that I *think* the above would be a fairly surgical and minimal addition to the path walk. I _think_ your vfs_rcu_magic_lookup() model would require a fair amount of moving code around to break into that path_lookupat -> complete_walk -> try_to_unlazy place and be able to restart etc. But maybe you have a clever plan. Linus