On Wed, Jan 18, 2023 at 08:44:57AM +1100, Dave Chinner wrote: > On Tue, Jan 17, 2023 at 01:37:35PM +0100, Jan Kara wrote: > > Hello! > > > > I've some across an interesting issue that was spotted by syzbot [1]. The > > report is against UDF but AFAICS the problem exists for ext4 as well and > > possibly other filesystems. The problem is the following: When we are > > renaming directory 'dir' say rename("foo/dir", "bar/") we lock 'foo' and > > 'bar' but 'dir' is unlocked because the locking done by vfs_rename() is > > > > if (!is_dir || (flags & RENAME_EXCHANGE)) > > lock_two_nondirectories(source, target); > > else if (target) > > inode_lock(target); > > > > However some filesystems (e.g. UDF but ext4 as well, I suspect XFS may be > > hurt by this as well because it converts among multiple dir formats) need > > to update parent pointer in 'dir' and nothing protects this update against > > a race with someone else modifying 'dir'. Now this is mostly harmless > > because the parent pointer (".." directory entry) is at the beginning of > > the directory and stable however if for example the directory is converted > > from packed "in-inode" format to "expanded" format as a result of > > concurrent operation on 'dir', the filesystem gets corrupted (or crashes as > > in case of UDF). > > No, xfs_rename() does not have this problem - we pass four inodes to > the function - the source directory, source inode, destination > directory and destination inode. Um, I think it does have this problem. xfs_readdir thinks it can parse a shortform inode without taking the ILOCK: if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) return xfs_dir2_sf_getdents(&args, ctx); lock_mode = xfs_ilock_data_map_shared(dp); error = xfs_dir2_isblock(&args, &isblock); So xfs_dir2_sf_replace can rewrite the shortform structure (or even convert it to block format!) while readdir is accessing it. Or am I mising something? --D > In the above case, "dir/" is passed to XFs as the source inode - the > src_dir is "foo/", the target dir is "bar/" and the target inode is > null. src_dir != target_dir, so we set the "new_parent" flag. the > srouce inode is a directory, so we set the src_is_directory flag, > too. > > We lock all three inodes that are passed. We do various things, then > run: > > if (new_parent && src_is_directory) { > /* > * Rewrite the ".." entry to point to the new > * directory. > */ > error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot, > target_dp->i_ino, spaceres); > ASSERT(error != -EEXIST); > if (error) > goto out_trans_cancel; > } > > which replaces the ".." entry in source inode atomically whilst it > is locked. Any directory format changes that occur during the > rename are done while the ILOCK is held, so they appear atomic to > outside observers that are trying to parse the directory structure > (e.g. readdir). > > > So we'd need to lock 'source' if it is a directory. > > Yup, and XFS goes further by always locking the source inode in a > rename, even if it is not a directory. This ensures the inode being > moved cannot have it's metadata otherwise modified whilst the rename > is in progress, even if that modification would have no impact on > the rename. It's a pretty strict interpretation of "rename is an > atomic operation", but it avoids accidentally missing nasty corner > cases like the one described above... > > > Ideally this would > > happen in VFS as otherwise I bet a lot of filesystems will get this wrong > > so could vfs_rename() lock 'source' if it is a dir as well? Essentially > > this would amount to calling lock_two_nondirectories(source, target) > > unconditionally but that would become a serious misnomer ;). Al, any > > thought? > > XFS just has a function that allows for an arbitrary number of > inodes to be locked in the given order: xfs_lock_inodes(). For > rename, the lock order is determined by xfs_sort_for_rename(). > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx