Re: Locking for RENAME_EXCHANGE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 24, 2023 at 06:35:04PM +0200, Jan Kara wrote:
> Hello!
> 
> This is again about the problem with directory renames I've already
> reported in [1]. To quickly sum it up some filesystems (so far we know at
> least about xfs, ext4, udf, reiserfs) need to lock the directory when it is
> being renamed into another directory. This is because we need to update the
> parent pointer in the directory in that case and if that races with other
> operation on the directory, bad things can happen.
> 
> So far we've done the locking in the filesystem code but recently Darrick
> pointed out [2] that we've missed the RENAME_EXCHANGE case in our ext4 fix.
> That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix
> regular files and directories. Couple nasty arising cases:
> 
> 1) We need to additionally lock two exchanged directories. Suppose a
> situation like:
> 
> mkdir P; mkdir P/A; mkdir P/B; touch P/B/F
> 
> CPU1						CPU2
> renameat2("P/A", "P/B", RENAME_EXCHANGE);	renameat2("P/B/F", "P/A", 0);
> 
> Both operations need to lock A and B directories which are unrelated in the
> tree. This means we must establish stable lock ordering on directory locks
> even for the case when they are not in ancestor relationship.
> 
> 2) We may need to lock a directory and a non-directory and they can be in
> parent-child relationship when hardlinks are involved:
> 
> mkdir A; mkdir B; touch A/F; ln A/F B/F
> renameat2("A/F", "B");
> 
> And this is really nasty because we don't have a way to find out whether
> "A/F" and "B" are in any relationship - in particular whether B happens to
> be another parent of A/F or not.
> 
> What I've decided to do is to make sure we always lock directory first in
> this mixed case and that *should* avoid all the deadlocks but I'm spelling
> this out here just in case people can think of some even more wicked case
> before I'll send patches.

I can think of a few things:

Not taking i_rwsem when updating the dotdot entry as part of moving
child directories means that readdir can race with a dotdot update to a
shortformat XFS directory.  That currently doesn't get too ugly because
we reserve a full 8 bytes for the shortform dotdot entry's inode (which
means we are not at risk of changing the format).  As Dave has
previously noted, xfs synchronizes on the ILOCK for metadata updates.

This also makes online directory fsck harder because any time we have to
cycle the directory's ILOCK means we have to revalidate the directory
contents afterwards or hook the directory update code so that we can
learn about changes that happened while the ILOCK was dropped.

(I'm sure the rest of you are thinking "that's some crazy xfs thing").

Things get weirder with the new xfs parent pointers feature -- rename
doesn't take i_rwsem of the children, which means that the only
synchronization point is with the ILOCK held.  Parent pointers are
stored in a special xattr namespace.  This makes me nervous about
collisions with a getxattr call, since those can be done with or without
i_rwsem held at all.  I think the ILOCK saves us in this case, since
I've been testing it for a few months now without any corruption
problems.

Obviously, online fsck of parent pointers has to employ the same
revalidate/hook to deal with ILOCK cycling.  But again, that's "some
crazy xfs thing".

I wouldn't mind the VFS taking i_rwsem on the children to simplify
online fsck, but I've got a reasonable enough workaround.

--D

> Also I wanted to ask (Miklos in particular as RENAME_EXCHANGE author): Why
> do we lock non-directories in RENAME_EXCHANGE case? If we didn't have to do
> that things would be somewhat simpler...
> 
> 								Honza
> 
> [1] https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3
> [2] https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs
> 
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux