On Tue, 13 Sep 2022, Al Viro wrote: > On Tue, Sep 13, 2022 at 06:20:10AM +0100, Al Viro wrote: > > > > Alternately, lock the "from" directory as well as the "to" directory. > > > That would mean using lock_rename() and generally copying the structure > > > of do_renameat2() into do_linkat() > > > > Ever done cp -al? Cross-directory renames are relatively rare; cross-directory > > links can be fairly heavy on some payloads, and you'll get ->s_vfs_rename_mutex > > held a _lot_. > > > > > I wonder if you could get a similar race trying to create a file in > > > (empty directory) /tmp/foo while /tmp/bar was being renamed over it. > > > > Neil, no offense, but... if you have plans regarding changes in > > directory locking, you might consider reading through the file called > > Documentation/filesystems/directory-locking.rst > > > > Occasionally documentation is where one could expect to find it... > > ... and that "..." above should've been ";-)" - it was not intended as > a dig, especially since locking in that area has subtle and badly > underdocumented parts (in particular, anything related to fh_to_dentry(), > rules regarding the stability of ->d_name and ->d_parent, mount vs. dentry > invalidation and too many other things), but the basic stuff like that > is actually covered. > > FWIW, the answer to your question is that the victim of overwriting > rename is held locked by caller of ->rename(); combined with the lock > held on directory by anyone who modifies it that prevents the race > you are asking about. > > See > if (!is_dir || (flags & RENAME_EXCHANGE)) > lock_two_nondirectories(source, target); > else if (target) > inode_lock(target); > in vfs_rename(). > I don't think those locks address the race I was thinking of. Suppose a /tmp/shared-dir is an empty directory and one thread runs rename("/tmp/tmp-dir", "/tmp/shared-dir") while another thread runs open("/tmp/shared-dir/Afile", O_CREAT | O_WRONLY) If the first wins the file is created in what was tmp-dir. If the second wins, the rename fails because shared-dir isn't empty. But they can race. The open completes the lookup of /tmp/share-dir and holds the dentry, but the rename succeeds with inode_lock(target) in the code fragment you provided above before the open() can get the lock. By the time open() does get the lock, the dentry it holds will be marked S_DEAD and the __lookup_hash() will fail. So the open() returns -ENOENT - unexpected. Test code below, based on the test code for the link race. I wonder if S_DEAD should result in -ESTALE rather than -ENOENT. That would cause the various retry_estale() loops to retry the whole operation. I suspect we'd actually need more subtlety than just that simple change, but it might be worth exploring. NeilBrown /* Linux rename vs. create race condition. * Rationale: both (1) moving a directory to an empty target and * (2) creating a file in the target can race. * The create should always succeed, but there should always be * directory there, either old or new. * The rename might fail if the target isn't empty. * Sample file courtesy of Xavier Grand at Algolia * g++ -pthread createbug.c -o createbug */ #include <thread> #include <unistd.h> #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <iostream> #include <string.h> static const char* producedDir = "/tmp/shared-dir"; static const char* producedTmpDir = "/tmp/new-dir"; static const char* producedThreadFile = "/tmp/shared-dir/Afile"; bool createFile(const char* path) { const int fdOut = open(path, O_WRONLY | O_CREAT | O_TRUNC | O_EXCL | O_CLOEXEC, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH); if (fdOut < 0) return false; assert(write(fdOut, "Foo", 4) == 4); assert(close(fdOut) == 0); return true; } void func() { int nbSuccess = 0; // Loop producedThread create file in dir while (true) { if (createFile(producedThreadFile) != true) { std::cout << "Failed after " << nbSuccess << " with " << strerror(errno) << std::endl; exit(EXIT_FAILURE); } else { nbSuccess++; } unlink(producedThreadFile); } } int main() { // Setup env rmdir(producedTmpDir); unlink(producedThreadFile); rmdir(producedDir); mkdir(producedDir, 0777); // Async thread doing a hardlink and moving it std::thread t(func); // Loop creating a .tmp and moving it while (true) { assert(mkdir(producedTmpDir, 0777) == 0); while (rename(producedTmpDir, producedDir) != 0) unlink(producedThreadFile); } return 0; }