There has been a longstanding race condition between rename(2) and link(2), when those operations are done in parallel: 1. Moving a file to an existing target file (eg. mv file target) 2. Creating a link from the target file to a third file (eg. ln target link) By the time vfs_link() locks the target inode, it might already be unlinked by rename. This results in vfs_link() returning -ENOENT in order to prevent linking to already unlinked files. This check was introduced in v2.6.39 by commit aae8a97d3ec3 ("fs: Don't allow to create hardlink for deleted file"). This breaks apparent atomicity of rename(2), which is described in standards and the man page: "If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing." The simplest fix is to exclude renames for the complete link operation. This patch introduces a global rw_semaphore that is locked for read in rename and for write in link. To prevent excessive contention, do not take the lock in link on the first try. If the source of the link was found to be unlinked, then retry with the lock held. Reproducer can be found at: https://lore.kernel.org/all/20220216131814.GA2463301@xavier-xps/ Reported-by: Xavier Roche <xavier.roche@xxxxxxxxxxx> Link: https://lore.kernel.org/all/20220214210708.GA2167841@xavier-xps/ Fixes: aae8a97d3ec3 ("fs: Don't allow to create hardlink for deleted file") Tested-by: Xavier Roche <xavier.roche@xxxxxxxxxxx> Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxxxxx> --- fs/namei.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 3f1829b3ab5b..dd6908cee49d 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -122,6 +122,8 @@ * PATH_MAX includes the nul terminator --RR. */ +static DECLARE_RWSEM(link_rwsem); + #define EMBEDDED_NAME_MAX (PATH_MAX - offsetof(struct filename, iname)) struct filename * @@ -2961,6 +2963,8 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) { struct dentry *p; + down_read(&link_rwsem); + if (p1 == p2) { inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); return NULL; @@ -2995,6 +2999,8 @@ void unlock_rename(struct dentry *p1, struct dentry *p2) inode_unlock(p2->d_inode); mutex_unlock(&p1->d_sb->s_vfs_rename_mutex); } + + up_read(&link_rwsem); } EXPORT_SYMBOL(unlock_rename); @@ -4456,6 +4462,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd, struct path old_path, new_path; struct inode *delegated_inode = NULL; int how = 0; + bool lock = false; int error; if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) { @@ -4474,10 +4481,13 @@ int do_linkat(int olddfd, struct filename *old, int newdfd, if (flags & AT_SYMLINK_FOLLOW) how |= LOOKUP_FOLLOW; +retry_lock: + if (lock) + down_write(&link_rwsem); retry: error = filename_lookup(olddfd, old, how, &old_path, NULL); if (error) - goto out_putnames; + goto out_unlock_link; new_dentry = filename_create(newdfd, new, &new_path, (how & LOOKUP_REVAL)); @@ -4511,8 +4521,16 @@ int do_linkat(int olddfd, struct filename *old, int newdfd, how |= LOOKUP_REVAL; goto retry; } + if (!lock && error == -ENOENT) { + path_put(&old_path); + lock = true; + goto retry_lock; + } out_putpath: path_put(&old_path); +out_unlock_link: + if (lock) + up_write(&link_rwsem); out_putnames: putname(old); putname(new); -- 2.34.1