On Sun, 18 Feb 2024 at 10:08, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > The only ugliness I see is the one that comes from the original code - > I'm not thrilled about the "return -EAGAIN" part, and I think that if > we found a previously stashed entry after all, we should loop. > > But I think that whole horror comes from a fear of an endless loop > when the dentry is marked dead, and another CPU still sees the old > value (so you can't re-use it, and yet it's not NULL). Actually, I think this is fairly easily fixable, but let's fix it *after* you've done your cleanups. The eventual fix is fairly simple: allow installing a new dentry not just as a replacement for a previous NULL dentry, but also replacing a previous dead dentry. That requires just two simple changes: - the ->d_prune() callback should no longer just blindly set the stashed value to NULL, it would do // Somebody could have re-used our stash as the dentry // died, so we only NULL it out of if matches our pruned one cmpxchg(&stashed, dentry, NULL); - when installing, instead of doing if (cmpxchg(&stashed, NULL, dentry) .. FAIL .. we'd loop with something like this: guard(rcu)(); for (;;) { struct dentry *old; // Assume any old dentry was cleared out old = cmpxchg(&stashed, NULL, dentry); if (likely(!old)) break; // Maybe somebody else installed a good dentry // .. so release ours and use the new one if (lockref_get_not_dead(&old->d_lockref)) { d_delete(dentry); dput(dentry); return old; } // There's an old dead dentry there, try to take it over if (likely(try_cmpxchg(&stashed, old, dentry))) break; // Ooops, that failed, to back and try again cpu_relax(); } // We successfully installed our dentry // as the new stashed one return dentry; which really isn't doesn't look that complicated (note the RCU guard as a way to make sure this all runs RCU-locked without needing to worry about the unlock sequence). Note: your initial optimistic "get_stashed_dentry()" stays exactly as it is. The above loop is just for the "oh, we didn't trivially re-use an old dentry, so now we need to allocate a new one and install it" case. Anyway, the above is written just in the MUA, there's no testing of the above, and again - I think this should be done *after* you've done the cleanups of the current code. But I think it would clarify the odd race condition with an old dentry dying just as a new one is created, and make sure there isn't some -EAGAIN case that people need to worry about. Because either we can re-use the old one, or there isn't an old one, or we find a dead one that can't be reused but can just be replaced. Fairly straightforward, no? Linus