On 08/11, Dave Chinner wrote: > > On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote: > > > > One would like to construct the lock chain as: > > > > CPU0 (chown foo dir) CPU1 (readdir dir) CPU2 (page fault) > > process Y process X, thread 0 process X, thread 1 > > > > get ILOCK for dir > > gets freeze protection > > starts transaction in xfs_setattr_nonsize > > waits to get ILOCK on 'dir' > > get mmap_sem for X > > wait for mmap_sem for process X > > in filldir() > > wait for freeze protection in > > xfs_page_mkwrite > > > > and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to > > finish it's freeze-protected section. But this cannot happen. The reason is > > that we block writers level-by-level and thus while there are writers at > > level X, we do not block writers at level X+1. So in this particular case > > freeze_super() will block waiting for CPU0 to finish its freeze protected > > section while CPU2 is free to continue. > > > > In general we have a chain like > > > > freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\ > > A | > > \------------------------------------------/ > > > > But since ILOCK is always acquired with freeze protection at L0 and we can > > block at L1 only after there are no writers at L0, this loop can never > > happen. > > > > Note that if we use the property of freezing that lock at level X+1 cannot > > block when we hold lock at level X, we can as well simplify the dependency > > graph and track in it only the lowest level of freeze lock that is > > currently acquired (since the levels above it cannot block and do not in > > any way influence blocking of other processes either and thus are > > irrelevant for the purpose of deadlock detection). Then the dependency > > graph we'd get would be: > > > > freeze L0 -> ILOCK -> mmap_sem -> freeze L1 > > > > and we have a nice acyclic graph we like to see... So probably we have to > > hack the lockdep instrumentation some more and just don't tell lockdep > > about freeze locks at higher levels if we already hold a lock at lower > > level. Thoughts? > > The XFS directory ilock->filldir->might_fault locking path has been > generating false positives in quite a lot of places because of > things we do on one side of the mmap_sem in filesystem paths vs > thigs we do on the other side of the mmap_sem in the page fault > path. OK. Dave, Jan, thanks a lot. I was also confused because I didn't know that "Chain exists of" part of print_circular_bug() only prints the _partial_ chain, and I have to admit that I do not even understand which part it actually shows... I'll drop move rwsem_release() from sb_wait_write() to freeze_super() change thaw_super() to re-acquire s_writers.lock_map from the previous series and resend everything. Lets change sb_writers to use percpu_rw_semaphore first, then try to improve the lockdep annotations. See the interdiff below. With this change I have TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH \ ./check `grep -il freeze tests/*/???` ... Ran: generic/068 generic/085 generic/280 generic/311 xfs/011 xfs/119 xfs/297 Passed all 7 tests anything else I should test? Oleg. this needs a comment in sb_wait_write() to explain that this is not what we want. --- a/fs/super.c +++ b/fs/super.c @@ -1215,27 +1215,15 @@ EXPORT_SYMBOL(__sb_start_write); static void sb_wait_write(struct super_block *sb, int level) { percpu_down_write(sb->s_writers.rw_sem + level-1); + percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_); } -static void sb_freeze_release(struct super_block *sb) -{ - int level; - /* Avoid the warning from lockdep_sys_exit() */ - for (level = 0; level < SB_FREEZE_LEVELS; ++level) - percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_); -} - -static void sb_freeze_acquire(struct super_block *sb) +static void sb_freeze_unlock(struct super_block *sb) { int level; for (level = 0; level < SB_FREEZE_LEVELS; ++level) percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_); -} - -static void sb_freeze_unlock(struct super_block *sb) -{ - int level; for (level = SB_FREEZE_LEVELS; --level >= 0; ) percpu_up_write(sb->s_writers.rw_sem + level); @@ -1331,7 +1319,6 @@ int freeze_super(struct super_block *sb) * sees write activity when frozen is set to SB_FREEZE_COMPLETE. */ sb->s_writers.frozen = SB_FREEZE_COMPLETE; - sb_freeze_release(sb); up_write(&sb->s_umount); return 0; } @@ -1358,14 +1345,11 @@ int thaw_super(struct super_block *sb) goto out; } - sb_freeze_acquire(sb); - if (sb->s_op->unfreeze_fs) { error = sb->s_op->unfreeze_fs(sb); if (error) { printk(KERN_ERR "VFS:Filesystem thaw failed\n"); - sb_freeze_release(sb); up_write(&sb->s_umount); return error; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html