Jan, Dave, please help. I'll try to re-check/re-test, but so far I think that this and the previous series are correct. However 3/4 from the previous series (attached at the end just in case) uncovers (I think) some problems in xfs locking. What should I do now? On 07/22, Oleg Nesterov wrote: > > Testing. Well, so far I only verified that ioctl(FIFREEZE) + > ioctl(FITHAW) seems to wors "as expected" on my testing machine > with ext3. So probably this needs more testing. Finally I got the testing machine and ran xfstests, I did dd if=/dev/zero of=TEST.img bs=1MiB count=4000 dd if=/dev/zero of=SCRATCH.img bs=1MiB count=4000 losetup --find --show TEST.img losetup --find --show SCRATCH.img mkfs.xfs -f /dev/loop0 mkfs.xfs -f /dev/loop1 mkdir -p TEST SCRATCH mount /dev/loop0 TEST mount /dev/loop1 SCRATCH TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH \ ./check `grep -il freeze tests/*/???` several times and every time the result looks like below: FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 intel-canoepass-10 4.1.0+ MKFS_OPTIONS -- -f -bsize=4096 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/loop1 SCRATCH generic/068 59s ... 61s generic/085 23s ... 26s generic/280 2s ... 3s generic/311 169s ... 167s xfs/011 21s ... 20s xfs/119 32s ... 21s xfs/297 455s ... 301s Ran: generic/068 generic/085 generic/280 generic/311 xfs/011 xfs/119 xfs/297 Passed all 7 tests But with CONFIG_LOCKDEP=y generic/068 triggers the warning: [ 66.092831] ====================================================== [ 66.099726] [ INFO: possible circular locking dependency detected ] [ 66.106719] 4.1.0+ #2 Not tainted [ 66.110414] ------------------------------------------------------- [ 66.117405] fsstress/2210 is trying to acquire lock: [ 66.122944] (&mm->mmap_sem){++++++}, at: [<ffffffff81200562>] might_fault+0x42/0xa0 [ 66.131637] but task is already holding lock: [ 66.138146] (&xfs_dir_ilock_class){++++..}, at: [<ffffffffa057da52>] xfs_ilock+0xc2/0x170 [xfs] [ 66.148022] which lock already depends on the new lock. [ 66.157141] the existing dependency chain (in reverse order) is: [ 66.165490] -> #4 (&xfs_dir_ilock_class){++++..}: [ 66.170974] [<ffffffff810fc94e>] lock_acquire+0xbe/0x150 [ 66.177596] [<ffffffff810f4bbc>] down_write_nested+0x3c/0x70 [ 66.184605] [<ffffffffa057dab6>] xfs_ilock+0x126/0x170 [xfs] [ 66.191645] [<ffffffffa057c4da>] xfs_setattr_nonsize+0x3ba/0x5d0 [xfs] [ 66.199638] [<ffffffffa057cb5a>] xfs_vn_setattr+0x9a/0xd0 [xfs] [ 66.206950] [<ffffffff81279411>] notify_change+0x271/0x3a0 [ 66.213762] [<ffffffff8125391b>] chown_common.isra.11+0x15b/0x200 [ 66.221251] [<ffffffff812551a6>] SyS_lchown+0xa6/0xf0 [ 66.227576] [<ffffffff8182662e>] system_call_fastpath+0x12/0x76 [ 66.234878] -> #3 (sb_internal#2){++++++}: [ 66.239698] [<ffffffff810fc94e>] lock_acquire+0xbe/0x150 [ 66.246316] [<ffffffff81824256>] down_write+0x36/0x70 [ 66.252644] [<ffffffff81100979>] percpu_down_write+0x39/0x110 [ 66.259760] [<ffffffff8125901d>] freeze_super+0xbd/0x190 [ 66.266369] [<ffffffff8126dbc8>] do_vfs_ioctl+0x3f8/0x520 [ 66.273082] [<ffffffff8126dd71>] SyS_ioctl+0x81/0xa0 [ 66.279311] [<ffffffff8182662e>] system_call_fastpath+0x12/0x76 [ 66.286610] -> #2 (sb_pagefaults#2){++++++}: [ 66.291623] [<ffffffff810fc94e>] lock_acquire+0xbe/0x150 [ 66.298237] [<ffffffff811007c1>] percpu_down_read+0x51/0xa0 [ 66.305144] [<ffffffff8125979b>] __sb_start_write+0xdb/0x120 [ 66.312140] [<ffffffff81295eda>] block_page_mkwrite+0x3a/0xb0 [ 66.319245] [<ffffffffa057046e>] xfs_filemap_page_mkwrite+0x5e/0xb0 [xfs] [ 66.327533] [<ffffffff812009a8>] do_page_mkwrite+0x58/0x100 [ 66.334433] [<ffffffff81205497>] handle_mm_fault+0x537/0x17c0 [ 66.341533] [<ffffffff8106a03c>] __do_page_fault+0x19c/0x450 [ 66.348542] [<ffffffff8106a31f>] do_page_fault+0x2f/0x80 [ 66.355172] [<ffffffff818286e8>] page_fault+0x28/0x30 [ 66.361493] -> #1 (&(&ip->i_mmaplock)->mr_lock){++++++}: [ 66.367659] [<ffffffff810fc94e>] lock_acquire+0xbe/0x150 [ 66.374275] [<ffffffff810f4aef>] down_read_nested+0x3f/0x60 [ 66.381185] [<ffffffffa057daf8>] xfs_ilock+0x168/0x170 [xfs] [ 66.388212] [<ffffffffa057050c>] xfs_filemap_fault+0x4c/0xb0 [xfs] [ 66.395817] [<ffffffff81200a9e>] __do_fault+0x4e/0x100 [ 66.402239] [<ffffffff81205454>] handle_mm_fault+0x4f4/0x17c0 [ 66.409340] [<ffffffff8106a03c>] __do_page_fault+0x19c/0x450 [ 66.416344] [<ffffffff8106a31f>] do_page_fault+0x2f/0x80 [ 66.422959] [<ffffffff818286e8>] page_fault+0x28/0x30 [ 66.429283] -> #0 (&mm->mmap_sem){++++++}: [ 66.434093] [<ffffffff810fb867>] __lock_acquire+0x20d7/0x2100 [ 66.441194] [<ffffffff810fc94e>] lock_acquire+0xbe/0x150 [ 66.447808] [<ffffffff8120058f>] might_fault+0x6f/0xa0 [ 66.454235] [<ffffffff8126e05a>] filldir+0x9a/0x130 [ 66.460373] [<ffffffffa056daed>] xfs_dir2_sf_getdents.isra.10+0x1bd/0x220 [xfs] [ 66.469233] [<ffffffffa056e72e>] xfs_readdir+0x17e/0x1a0 [xfs] [ 66.476449] [<ffffffffa057095b>] xfs_file_readdir+0x2b/0x30 [xfs] [ 66.483954] [<ffffffff8126de2a>] iterate_dir+0x9a/0x140 [ 66.490473] [<ffffffff8126e361>] SyS_getdents+0x91/0x120 [ 66.497091] [<ffffffff8182662e>] system_call_fastpath+0x12/0x76 [ 66.504381] other info that might help us debug this: [ 66.513316] Chain exists of: &mm->mmap_sem --> sb_internal#2 --> &xfs_dir_ilock_class [ 66.522619] Possible unsafe locking scenario: [ 66.529222] CPU0 CPU1 [ 66.534275] ---- ---- [ 66.539327] lock(&xfs_dir_ilock_class); [ 66.543823] lock(sb_internal#2); [ 66.550465] lock(&xfs_dir_ilock_class); [ 66.557767] lock(&mm->mmap_sem); [ 66.561580] *** DEADLOCK *** [ 66.568186] 2 locks held by fsstress/2210: [ 66.572753] #0: (&type->i_mutex_dir_key#5){+.+.+.}, at: [<ffffffff8126ddf1>] iterate_dir+0x61/0x140 [ 66.583103] #1: (&xfs_dir_ilock_class){++++..}, at: [<ffffffffa057da52>] xfs_ilock+0xc2/0x170 [xfs] [ 66.593457] stack backtrace: [ 66.598321] CPU: 26 PID: 2210 Comm: fsstress Not tainted 4.1.0+ #2 [ 66.605215] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 [ 66.616372] 0000000000000000 0000000048bb9c77 ffff8817fa127ad8 ffffffff8181d9dd [ 66.624663] 0000000000000000 ffffffff8288f500 ffff8817fa127b28 ffffffff810f7b26 [ 66.632955] 0000000000000002 ffff8817fa127b98 ffff8817fa127b28 ffff881803f06480 [ 66.641249] Call Trace: [ 66.643983] [<ffffffff8181d9dd>] dump_stack+0x45/0x57 [ 66.649713] [<ffffffff810f7b26>] print_circular_bug+0x206/0x280 [ 66.656413] [<ffffffff810fb867>] __lock_acquire+0x20d7/0x2100 [ 66.662919] [<ffffffff810fa1b7>] ? __lock_acquire+0xa27/0x2100 [ 66.669523] [<ffffffff810fc94e>] lock_acquire+0xbe/0x150 [ 66.675543] [<ffffffff81200562>] ? might_fault+0x42/0xa0 [ 66.681566] [<ffffffff8120058f>] might_fault+0x6f/0xa0 [ 66.687394] [<ffffffff81200562>] ? might_fault+0x42/0xa0 [ 66.693418] [<ffffffff8126e05a>] filldir+0x9a/0x130 [ 66.698968] [<ffffffffa057db34>] ? xfs_ilock_data_map_shared+0x34/0x40 [xfs] [ 66.706941] [<ffffffffa056daed>] xfs_dir2_sf_getdents.isra.10+0x1bd/0x220 [xfs] [ 66.715203] [<ffffffffa057da52>] ? xfs_ilock+0xc2/0x170 [xfs] [ 66.721712] [<ffffffffa056e72e>] xfs_readdir+0x17e/0x1a0 [xfs] [ 66.728316] [<ffffffff81822b6d>] ? mutex_lock_killable_nested+0x3ad/0x480 [ 66.735998] [<ffffffffa057095b>] xfs_file_readdir+0x2b/0x30 [xfs] [ 66.742894] [<ffffffff8126de2a>] iterate_dir+0x9a/0x140 [ 66.748819] [<ffffffff8127a1ac>] ? __fget_light+0x6c/0xa0 [ 66.754938] [<ffffffff8126e361>] SyS_getdents+0x91/0x120 [ 66.760958] [<ffffffff8126dfc0>] ? fillonedir+0xf0/0xf0 [ 66.766884] [<ffffffff8182662e>] system_call_fastpath+0x12/0x76 The chain reported by lockdep is not exactly the same every time, but similar. Once again, I'll recheck. but the patch below still looks correct to me, and I think that it is actually a fix. Before this patch freeze_super() calls sync_filesystem() and s_op->freeze_fs(sb) without ->s_writers locks (as it seen by lockdep) and this is wrong. After this patch lockdep knows about ->s_writers locks held and this is what we want, but apparently this way lockdep can notice the new dependencies. Oleg. ------------------------------------------------------------------------------ Subject: [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Move the "fool the lockdep" code from sb_wait_write() into the new simple helper, sb_lockdep_release(), called once before return from freeze_super(). This is preparation, but imo this makes sense in any case. This way we do not hide from lockdep the "write" locks we hold when we call s_op->freeze_fs(sb). Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> --- fs/super.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/super.c b/fs/super.c index d0fdd49..e7ea9f1 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level) { s64 writers; - /* - * We just cycle-through lockdep here so that it does not complain - * about returning with lock to userspace - */ rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_); - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_); do { DEFINE_WAIT(wait); - /* * We use a barrier in prepare_to_wait() to separate setting * of frozen and checking of the counter @@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level) } while (writers); } +static void sb_freeze_release(struct super_block *sb) +{ + int level; + /* Avoid the warning from lockdep_sys_exit() */ + for (level = 0; level < SB_FREEZE_LEVELS; ++level) + rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_); +} + /** * freeze_super - lock the filesystem and force it into a consistent state * @sb: the super to lock @@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb) sb->s_writers.frozen = SB_UNFROZEN; smp_wmb(); wake_up(&sb->s_writers.wait_unfrozen); + sb_freeze_release(sb); deactivate_locked_super(sb); return ret; } @@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb) * sees write activity when frozen is set to SB_FREEZE_COMPLETE. */ sb->s_writers.frozen = SB_FREEZE_COMPLETE; + sb_freeze_release(sb); up_write(&sb->s_umount); return 0; } -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html