Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held

Guoqing Jiang <guoqing.jiang@xxxxxxxxx> · Mon, 30 May 2022 17:55:58 +0800

On 5/23/22 5:51 PM, Guoqing Jiang wrote:

I noticed a clear regression with mdadm tests with this patch in 
md-next
(7e6ba434cc6080).

Before the patch, tests 07reshape5intr and 07revert-grow would fail
fairly infrequently (about 1 in 4 runs for the former and 1 in 30 
runs
for the latter).

After this patch, both tests always fail.

I don't have time to dig into why this is, but it would be nice if
someone can at least fix the regression. It is hard to make any 
progress
on these tests if we are continuing to further break them.

I have tried with both ubuntu 22.04 kernel which is 5.15 and vanilla 
5.12, none of them
can pass your mentioned tests.

[root@localhost mdadm]# lsblk|grep vd
vda          252:0    0    1G  0 disk
vdb          252:16   0    1G  0 disk
vdc          252:32   0    1G  0 disk
vdd          252:48   0    1G  0 disk
[root@localhost mdadm]# ./test --dev=disk --disks=/dev/vd{a..d} 
--tests=05r1-add-internalbitmap
Testing on linux-5.12.0-default kernel
/root/mdadm/tests/05r1-add-internalbitmap... succeeded
[root@localhost mdadm]# ./test --dev=disk --disks=/dev/vd{a..d} 
--tests=07reshape5intr
Testing on linux-5.12.0-default kernel
/root/mdadm/tests/07reshape5intr... FAILED - see 
/var/tmp/07reshape5intr.log and /var/tmp/fail07reshape5intr.log for 
details
[root@localhost mdadm]# ./test --dev=disk --disks=/dev/vd{a..d} 
--tests=07revert-grow
Testing on linux-5.12.0-default kernel
/root/mdadm/tests/07revert-grow... FAILED - see 
/var/tmp/07revert-grow.log and /var/tmp/fail07revert-grow.log for details
[root@localhost mdadm]# head -10  /var/tmp/07revert-grow.log | grep mdadm
+ . /root/mdadm/tests/07revert-grow
*++ mdadm -CR --assume-clean /dev/md0 -l5 -n4 -x1 /dev/vda /dev/vdb 
/dev/vdc /dev/vdd /dev/vda /dev/vdb /dev/vdc /dev/vdd --metadata=0.9**
*
The above line is clearly wrong from my understanding.

And let's check ubuntu 22.04.

root@vm:/home/gjiang/mdadm# lsblk|grep vd
vda    252:0    0     1G  0 disk
vdb    252:16   0     1G  0 disk
vdc    252:32   0     1G  0 disk
root@vm:/home/gjiang/mdadm# ./test --dev=disk --disks=/dev/vd{a..d} 
--tests=05r1-failfast
Testing on linux-5.15.0-30-generic kernel
/home/gjiang/mdadm/tests/05r1-failfast... succeeded
root@vm:/home/gjiang/mdadm# ./test --dev=disk --disks=/dev/vd{a..c}   
--tests=07reshape5intr
Testing on linux-5.15.0-30-generic kernel
/home/gjiang/mdadm/tests/07reshape5intr... FAILED - see 
/var/tmp/07reshape5intr.log and /var/tmp/fail07reshape5intr.log for 
details
root@vm:/home/gjiang/mdadm# ./test --dev=disk --disks=/dev/vd{a..c} 
--tests=07revert-grow
Testing on linux-5.15.0-30-generic kernel
/home/gjiang/mdadm/tests/07revert-grow... FAILED - see 
/var/tmp/07revert-grow.log and /var/tmp/fail07revert-grow.log for details

So I would not consider it is regression.

I tried with 5.18.0-rc3, no problem for 07reshape5intr (will investigate 
why it failed with this patch), but 07revert-grow still failed without
apply this one.

From fail07revert-grow.log, it shows below issues.

[ 7856.233515] mdadm[25246]: segfault at 0 ip 000000000040fe56 sp 
00007ffdcf252800 error 4 in mdadm[400000+81000]
[ 7856.233544] Code: 00 48 8d 7c 24 30 e8 79 30 ff ff 48 8d 7c 24 30 31 
f6 31 c0 e8 db 34 ff ff 85 c0 79 77 bf 26 50 46 00 b9 04 00 00 00 48 89 
de <f3> a6 0f 97 c0 1c 00 84 c0 75 18 e8 fa 36 ff ff 48 0f be 53 04 48

[ 7866.871747] mdadm[25463]: segfault at 0 ip 000000000040fe56 sp 
00007ffe91e39800 error 4 in mdadm[400000+81000]
[ 7866.871760] Code: 00 48 8d 7c 24 30 e8 79 30 ff ff 48 8d 7c 24 30 31 
f6 31 c0 e8 db 34 ff ff 85 c0 79 77 bf 26 50 46 00 b9 04 00 00 00 48 89 
de <f3> a6 0f 97 c0 1c 00 84 c0 75 18 e8 fa 36 ff ff 48 0f be 53 04 48

[ 7876.779855] ======================================================
[ 7876.779858] WARNING: possible circular locking dependency detected
[ 7876.779861] 5.18.0-rc3-57-default #28 Tainted: G            E
[ 7876.779864] ------------------------------------------------------
[ 7876.779867] mdadm/25444 is trying to acquire lock:
[ 7876.779870] ffff991817749938 ((wq_completion)md_misc){+.+.}-{0:0}, 
at: flush_workqueue+0x87/0x470
[ 7876.779879]
               but task is already holding lock:
[ 7876.779882] ffff9917c5c1c2c0 (&mddev->reconfig_mutex){+.+.}-{3:3}, 
at: action_store+0x11a/0x2c0 [md_mod]
[ 7876.779892]
               which lock already depends on the new lock.

[ 7876.779896]
               the existing dependency chain (in reverse order) is:
[ 7876.779899]
               -> #3 (&mddev->reconfig_mutex){+.+.}-{3:3}:
[ 7876.779904]        __mutex_lock+0x8f/0x920
[ 7876.779909]        layout_store+0x47/0x120 [md_mod]
[ 7876.779914]        md_attr_store+0x7a/0xc0 [md_mod]
[ 7876.779919]        kernfs_fop_write_iter+0x135/0x1b0
[ 7876.779924]        new_sync_write+0x10c/0x190
[ 7876.779927]        vfs_write+0x30e/0x370
[ 7876.779930]        ksys_write+0xa4/0xe0
[ 7876.779933]        do_syscall_64+0x3a/0x80
[ 7876.779936]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7876.779940]
               -> #2 (kn->active#359){++++}-{0:0}:
[ 7876.779945]        __kernfs_remove+0x28c/0x2e0
[ 7876.779948]        kernfs_remove_by_name_ns+0x52/0x90
[ 7876.779952]        remove_files.isra.1+0x30/0x70
[ 7876.779955]        sysfs_remove_group+0x3d/0x80
[ 7876.779958]        sysfs_remove_groups+0x29/0x40
[ 7876.779962]        __kobject_del+0x1b/0x80
[ 7876.779965]        kobject_del+0xf/0x20
[ 7876.779968]        mddev_delayed_delete+0x15/0x20 [md_mod]
[ 7876.779973]        process_one_work+0x2d8/0x650
[ 7876.779976]        worker_thread+0x2a/0x3b0
[ 7876.779979]        kthread+0xe8/0x110
[ 7876.779981]        ret_from_fork+0x22/0x30
[ 7876.779985]
               -> #1 ((work_completion)(&mddev->del_work)#2){+.+.}-{0:0}:
[ 7876.779990]        process_one_work+0x2af/0x650
[ 7876.779993]        worker_thread+0x2a/0x3b0
[ 7876.779996]        kthread+0xe8/0x110
[ 7876.779998]        ret_from_fork+0x22/0x30
[ 7876.780001]
               -> #0 ((wq_completion)md_misc){+.+.}-{0:0}:
[ 7876.780005]        __lock_acquire+0x12a0/0x1770
[ 7876.780009]        lock_acquire+0x277/0x310
[ 7876.780011]        flush_workqueue+0xae/0x470
[ 7876.780014]        action_store+0x188/0x2c0 [md_mod]
[ 7876.780019]        md_attr_store+0x7a/0xc0 [md_mod]
[ 7876.780024]        kernfs_fop_write_iter+0x135/0x1b0
[ 7876.780027]        new_sync_write+0x10c/0x190
[ 7876.780030]        vfs_write+0x30e/0x370
[ 7876.780033]        ksys_write+0xa4/0xe0
[ 7876.780036]        do_syscall_64+0x3a/0x80
[ 7876.780038]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7876.780042]
               other info that might help us debug this:

[ 7876.780045] Chain exists of:
                 (wq_completion)md_misc --> kn->active#359 --> 
&mddev->reconfig_mutex

[ 7876.780052]  Possible unsafe locking scenario:

[ 7876.780054]        CPU0                    CPU1
[ 7876.780056]        ----                    ----
[ 7876.780059]   lock(&mddev->reconfig_mutex);
[ 7876.780061] lock(kn->active#359);
[ 7876.780064] lock(&mddev->reconfig_mutex);
[ 7876.780068]   lock((wq_completion)md_misc);
[ 7876.780070]
                *** DEADLOCK ***

[ 7876.780073] 5 locks held by mdadm/25444:
[ 7876.780075]  #0: ffff9917c769a458 (sb_writers#3){.+.+}-{0:0}, at: 
ksys_write+0xa4/0xe0
[ 7876.780082]  #1: ffff9917de1a6518 (&of->prealloc_mutex){+.+.}-{3:3}, 
at: kernfs_fop_write_iter+0x5c/0x1b0
[ 7876.780088]  #2: ffff9917de1a6488 (&of->mutex){+.+.}-{3:3}, at: 
kernfs_fop_write_iter+0x103/0x1b0
[ 7876.780094]  #3: ffff99181b0d2158 (kn->active#287){++++}-{0:0}, at: 
kernfs_fop_write_iter+0x10c/0x1b0
[ 7876.780100]  #4: ffff9917c5c1c2c0 
(&mddev->reconfig_mutex){+.+.}-{3:3}, at: action_store+0x11a/0x2c0 [md_mod]
[ 7876.780108]
               stack backtrace:
[ 7876.780111] CPU: 1 PID: 25444 Comm: mdadm Tainted: G E     
5.18.0-rc3-57-default #28
[ 7876.780115] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[ 7876.780120] Call Trace:
[ 7876.780122]  <TASK>
[ 7876.780124]  dump_stack_lvl+0x55/0x6d
[ 7876.780128]  check_noncircular+0x105/0x120
[ 7876.780133]  ? __lock_acquire+0x12a0/0x1770
[ 7876.780136]  ? lockdep_lock+0x21/0x90
[ 7876.780139]  __lock_acquire+0x12a0/0x1770
[ 7876.780143]  lock_acquire+0x277/0x310
[ 7876.780146]  ? flush_workqueue+0x87/0x470
[ 7876.780149]  ? __raw_spin_lock_init+0x3b/0x60
[ 7876.780152]  ? lockdep_init_map_type+0x58/0x250
[ 7876.780156]  flush_workqueue+0xae/0x470
[ 7876.780158]  ? flush_workqueue+0x87/0x470
[ 7876.780161]  ? _raw_spin_unlock+0x29/0x40
[ 7876.780166]  ? action_store+0x188/0x2c0 [md_mod]
[ 7876.780171]  action_store+0x188/0x2c0 [md_mod]
[ 7876.780176]  md_attr_store+0x7a/0xc0 [md_mod]
[ 7876.780182]  kernfs_fop_write_iter+0x135/0x1b0
[ 7876.780186]  new_sync_write+0x10c/0x190
[ 7876.780190]  vfs_write+0x30e/0x370
[ 7876.780193]  ksys_write+0xa4/0xe0
[ 7876.780197]  do_syscall_64+0x3a/0x80
[ 7876.780200]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7876.780203] RIP: 0033:0x7f7cd43f9c03
[ 7876.780206] Code: 2d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 
1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 55 f3 c3 0f 1f 00 41 54 55 49 89 d4 53 48 89
[ 7876.780213] RSP: 002b:00007fff35b600b8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[ 7876.780218] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
00007f7cd43f9c03
[ 7876.780221] RDX: 0000000000000004 RSI: 00000000004669a7 RDI: 
0000000000000003
[ 7876.780224] RBP: 00000000004669a7 R08: 000000000000000b R09: 
00000000ffffffff
[ 7876.780227] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000183d730
[ 7876.780231] R13: 00000000ffffffff R14: 0000000000001800 R15: 
0000000000000000
[ 7876.780236]  </TASK>
[ 7876.781007] md: reshape of RAID array md0
[ 7876.785949] md: md0: reshape interrupted.
[ 7877.193686] md0: detected capacity change from 107520 to 0
[ 7877.193694] md: md0 stopped.
[ 7877.314438] debugfs: Directory 'md0' with parent 'block' already present!
[ 7877.649549] md: array md0 already has disks!
[ 7877.665188] md/raid:md0: device loop0 operational as raid disk 0
[ 7877.665193] md/raid:md0: device loop3 operational as raid disk 3
[ 7877.665197] md/raid:md0: device loop2 operational as raid disk 2
[ 7877.665199] md/raid:md0: device loop1 operational as raid disk 1
[ 7877.665202] md/raid:md0: device loop4 operational as raid disk 4
[ 7877.665205] md/raid:md0: force stripe size 512 for reshape
[ 7877.667172] md/raid:md0: raid level 5 active with 4 out of 4 devices, 
algorithm 2
[ 7877.667360] md0: detected capacity change from 0 to 107520
[ 7878.107865] mdadm[25693]: segfault at 0 ip 000000000040fe56 sp 
00007fff7a845d50 error 4 in mdadm[400000+81000]
[ 7878.107878] Code: 00 48 8d 7c 24 30 e8 79 30 ff ff 48 8d 7c 24 30 31 
f6 31 c0 e8 db 34 ff ff 85 c0 79 77 bf 26 50 46 00 b9 04 00 00 00 48 89 
de <f3> a6 0f 97 c0 1c 00 84 c0 75 18 e8 fa 36 ff ff 48 0f be 53 04 48
[ 7878.428298] md: reshape of RAID array md0
[ 7880.552543] md: md0: reshape done.
[ 7882.053117] md0: detected capacity change from 107520 to 0
[ 7882.053124] md: md0 stopped.
[ ... ]
[ 7904.978417] md0: detected capacity change from 0 to 107520
[ 7905.346749] mdadm[26328]: segfault at 0 ip 000000000040fe56 sp 
00007ffc28540c20 error 4 in mdadm[400000+81000]
[ 7905.346764] Code: 00 48 8d 7c 24 30 e8 79 30 ff ff 48 8d 7c 24 30 31 
f6 31 c0 e8 db 34 ff ff 85 c0 79 77 bf 26 50 46 00 b9 04 00 00 00 48 89 
de <f3> a6 0f 97 c0 1c 00 84 c0 75 18 e8 fa 36 ff ff 48 0f be 53 04 48
[ 7905.476488] md: reshape of RAID array md0
[ 7907.530152] md: md0: reshape done.
[ 7909.149524] md0: detected capacity change from 107520 to 0
[ 7909.149533] md: md0 stopped.

Thanks,
Guoqing