On Wed, 24 May 2023 at 18:35, Jan Kara <jack@xxxxxxx> wrote: > > Hello! > > This is again about the problem with directory renames I've already > reported in [1]. To quickly sum it up some filesystems (so far we know at > least about xfs, ext4, udf, reiserfs) need to lock the directory when it is > being renamed into another directory. This is because we need to update the > parent pointer in the directory in that case and if that races with other > operation on the directory, bad things can happen. > > So far we've done the locking in the filesystem code but recently Darrick > pointed out [2] that we've missed the RENAME_EXCHANGE case in our ext4 fix. > That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix > regular files and directories. Couple nasty arising cases: > > 1) We need to additionally lock two exchanged directories. Suppose a > situation like: > > mkdir P; mkdir P/A; mkdir P/B; touch P/B/F > > CPU1 CPU2 > renameat2("P/A", "P/B", RENAME_EXCHANGE); renameat2("P/B/F", "P/A", 0); Not sure I get it. CPU1 locks P then A then B CPU2 locks P then B then A Both start with P and after that ordering between A and B doesn't matter as long as the topology stays the same, which is guaranteed. Or did you mean renameat2("P/B/F", "P/A/F", 0);? This indeed looks deadlocky. > > Both operations need to lock A and B directories which are unrelated in the > tree. This means we must establish stable lock ordering on directory locks > even for the case when they are not in ancestor relationship. > > 2) We may need to lock a directory and a non-directory and they can be in > parent-child relationship when hardlinks are involved: > > mkdir A; mkdir B; touch A/F; ln A/F B/F > renameat2("A/F", "B"); > > And this is really nasty because we don't have a way to find out whether > "A/F" and "B" are in any relationship - in particular whether B happens to > be another parent of A/F or not. > > What I've decided to do is to make sure we always lock directory first in > this mixed case and that *should* avoid all the deadlocks but I'm spelling > this out here just in case people can think of some even more wicked case > before I'll send patches. Locking directories first has always been the case, described in detail in Documentation/filesystems/directory-locking.rst > Also I wanted to ask (Miklos in particular as RENAME_EXCHANGE author): Why > do we lock non-directories in RENAME_EXCHANGE case? If we didn't have to do > that things would be somewhat simpler... I can't say I remember anything, but digging into lock_two_nondirectories() this comes up quickly: 6cedba8962f4 ("vfs: take i_mutex on renamed file") So apparently NFS is relying on i_mutex to prevent delegations from being broken without its knowledge. Might be that is't NFS only, and then the RENAME_EXCHANGE case doesn't need it (NFS doesn't support RENAME_EXCHANGE), but I can't say for sure. Also Al seems to have had some thoughts on this in d42b386834ee ("update D/f/directory-locking") Thanks, Miklos