On Wed, Sep 14, 2022 at 10:14:44AM +1000, NeilBrown wrote: > But they can race. The open completes the lookup of /tmp/share-dir and > holds the dentry, but the rename succeeds with inode_lock(target) in the > code fragment you provided above before the open() can get the lock. > By the time open() does get the lock, the dentry it holds will be marked > S_DEAD and the __lookup_hash() will fail. > So the open() returns -ENOENT - unexpected. > > Test code below, based on the test code for the link race. > > I wonder if S_DEAD should result in -ESTALE rather than -ENOENT. > That would cause the various retry_estale() loops to retry the whole > operation. I suspect we'd actually need more subtlety than just that > simple change, but it might be worth exploring. I don't think that wording in POSIX prohibits that ENOENT. It does not (and no UNIX I've ever seen provides) atomicity of all link traversals involved in pathname resolution. root@sonny:/tmp# mkdir fubar root@sonny:/tmp# cd fubar/ root@sonny:/tmp/fubar# rmdir ../fubar root@sonny:/tmp/fubar# touch ./foo touch: cannot touch './foo': No such file or directory Do you agree that this is a reasonable behaviour? That, BTW, is why "retry on S_DEAD" is wrong - it can be a persistent state. Now, replace rmdir ../fubar with rename("/tmp/barf", "/tmp/fubar") and you'll get pretty much your testcase. You are asking not just for atomicity of rename vs. traversal of "fubar" in /tmp - that we already have. You are asking for the atomicity of the entire pathname resolution of open() argument wrt rename(). And that is a much scarier can of worms. Basically, pathname resolution happens on per-component basis and it's *NOT* guaranteed that filesystem won't be changed between those or that we won't observe such modifications. If you check POSIX, you'll see that it avoids any such promises - all it says (for directories; non-directory case has similar verbiage) is this: If the directory named by the new argument exists, it shall be removed and old renamed to new. In this case, a link named new shall exist throughout the renaming operation and shall refer either to the directory referred to by new or old before the operation began. It carefully stays the hell out of any pathname resolution atomicity promises - all it's talking about is resolution of a single link. It's enough to guarantee that rename("old", "new") won't race with mkdir("new") allowing the latter to succeed, with similar for creat(2), etc. - "new" is promised to refer to an existing object at all times. And that's what rename() atomicity is normally for. Operation on the link in question will observe either old or new state; operation that passed through that link on the way to whatever it will act upon has *also* observed either of those states - and the next pathname component had been looked up in either old or new directory, but there is no promise that nothing has happened to that directory since we got there.