Re: Locking issue with directory renames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 18, 2023 at 08:44:57AM +1100, Dave Chinner wrote:
> On Tue, Jan 17, 2023 at 01:37:35PM +0100, Jan Kara wrote:
> > Hello!
> > 
> > I've some across an interesting issue that was spotted by syzbot [1]. The
> > report is against UDF but AFAICS the problem exists for ext4 as well and
> > possibly other filesystems. The problem is the following: When we are
> > renaming directory 'dir' say rename("foo/dir", "bar/") we lock 'foo' and
> > 'bar' but 'dir' is unlocked because the locking done by vfs_rename() is
> > 
> >         if (!is_dir || (flags & RENAME_EXCHANGE))
> >                 lock_two_nondirectories(source, target);
> >         else if (target)
> >                 inode_lock(target);
> > 
> > However some filesystems (e.g. UDF but ext4 as well, I suspect XFS may be
> > hurt by this as well because it converts among multiple dir formats) need
> > to update parent pointer in 'dir' and nothing protects this update against
> > a race with someone else modifying 'dir'. Now this is mostly harmless
> > because the parent pointer (".." directory entry) is at the beginning of
> > the directory and stable however if for example the directory is converted
> > from packed "in-inode" format to "expanded" format as a result of
> > concurrent operation on 'dir', the filesystem gets corrupted (or crashes as
> > in case of UDF).
> 
> No, xfs_rename() does not have this problem - we pass four inodes to
> the function - the source directory, source inode, destination
> directory and destination inode.

Um, I think it does have this problem.  xfs_readdir thinks it can parse
a shortform inode without taking the ILOCK:

	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
		return xfs_dir2_sf_getdents(&args, ctx);

	lock_mode = xfs_ilock_data_map_shared(dp);
	error = xfs_dir2_isblock(&args, &isblock);

So xfs_dir2_sf_replace can rewrite the shortform structure (or even
convert it to block format!) while readdir is accessing it.  Or am I
mising something?

--D

> In the above case, "dir/" is passed to XFs as the source inode - the
> src_dir is "foo/", the target dir is "bar/" and the target inode is
> null. src_dir != target_dir, so we set the "new_parent" flag. the
> srouce inode is a directory, so we set the src_is_directory flag,
> too.
> 
> We lock all three inodes that are passed. We do various things, then
> run:
> 
>         if (new_parent && src_is_directory) {
>                 /*
>                  * Rewrite the ".." entry to point to the new
>                  * directory.
>                  */
>                 error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
>                                         target_dp->i_ino, spaceres);
>                 ASSERT(error != -EEXIST);
>                 if (error)
>                         goto out_trans_cancel;
>         }
> 
> which replaces the ".." entry in source inode atomically whilst it
> is locked.  Any directory format changes that occur during the
> rename are done while the ILOCK is held, so they appear atomic to
> outside observers that are trying to parse the directory structure
> (e.g. readdir).
> 
> > So we'd need to lock 'source' if it is a directory.
> 
> Yup, and XFS goes further by always locking the source inode in a
> rename, even if it is not a directory. This ensures the inode being
> moved cannot have it's metadata otherwise modified whilst the rename
> is in progress, even if that modification would have no impact on
> the rename. It's a pretty strict interpretation of "rename is an
> atomic operation", but it avoids accidentally missing nasty corner
> cases like the one described above...
> 
> > Ideally this would
> > happen in VFS as otherwise I bet a lot of filesystems will get this wrong
> > so could vfs_rename() lock 'source' if it is a dir as well? Essentially
> > this would amount to calling lock_two_nondirectories(source, target)
> > unconditionally but that would become a serious misnomer ;). Al, any
> > thought?
> 
> XFS just has a function that allows for an arbitrary number of
> inodes to be locked in the given order: xfs_lock_inodes(). For
> rename, the lock order is determined by xfs_sort_for_rename().
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux