Hello Neil, I believe I have found what is causing the deadlock. It happens in two flavors: 1) # raid1d() is called, and conf->pending_bio_list is non-empty at this point # raid1d() calls md_check_recovery(), which eventually calls raid1_add_disk(), which calls raise_barrier() # Now raise_barrier will wait for conf->nr_pending to become 0, but it cannot become 0, because there are bios sitting in conf->pending_bio_list, which nobody will flush, because raid1d is the one supposed to call flush_pending_writes(), either directly or via handle_read_error. But it is stuck in raise_barrier. 2) # raid1_add_disk() calls raise_barrier(), and waits for conf->nr_pending to become 0, as before # new WRITE comes and calls wait_barrier(), but this thread has a non-empty current->bio_list # In this case, the code allows the WRITE to go through wait_barrier(), and trigger WRITEs to mirror legs, but these WRITEs again end up in conf->pending_bio_list (either via raid1_unplug or directly). But nobody will flush conf->pending_bio_list, because raid1d is stuck in raise_barrier. Previously, for example in kernel 3.2, raid1_add_disk did not call raise_barrier, so this problem did not happen. Attached is a reproduction with some prints that I added to raise_barrier and wait_barrier (their code also attached). It demonstrates case 2. It shows that once raise_barrier got called, conf->nr_pending drops down, until it equals the number of wait_barrier calls, that slipped through because of non-empty current->bio_list. And at this point, this array hangs. Can you please comment on how to fix this problem. It looks like a real deadlock. We can perhaps call md_check_recovery() after flush_pending_writes(), but this only makes the window smaller, not closes it entirely. But it looks like we really should not be calling raise_barrier from raid1d. Thanks, Alex. On Thu, May 30, 2013 at 5:30 PM, Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Hi Neil, > > On Mon, May 20, 2013 at 10:17 AM, NeilBrown <neilb@xxxxxxx> wrote: >> On Thu, 16 May 2013 17:07:04 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> >> wrote: >> >>> Hello Neil, >>> we are hitting issue that looks very similar; we are on kernel 3.8.2. >>> The array is a 2-device raid1, which experienced a drive failure, but >>> then drive was removed and re-added back to the array (although >>> rebuild never started). Relevant parts of the kernel log: >>> >>> May 16 11:12:14 kernel: [46850.090499] md/raid1:md2: Disk failure on >>> dm-6, disabling device. >>> May 16 11:12:14 kernel: [46850.090499] md/raid1:md2: Operation >>> continuing on 1 devices. >>> May 16 11:12:14 kernel: [46850.090511] md: super_written gets >>> error=-5, uptodate=0 >>> May 16 11:12:14 kernel: [46850.090676] md/raid1:md2: dm-6: >>> rescheduling sector 18040736 >>> May 16 11:12:14 kernel: [46850.090764] md/raid1:md2: dm-6: >>> rescheduling sector 20367040 >>> May 16 11:12:14 kernel: [46850.090826] md/raid1:md2: dm-6: >>> rescheduling sector 17613504 >>> May 16 11:12:14 kernel: [46850.090883] md/raid1:md2: dm-6: >>> rescheduling sector 18042720 >>> May 16 11:12:15 kernel: [46850.229970] md/raid1:md2: redirecting >>> sector 18040736 to other mirror: dm-13 >>> May 16 11:12:15 kernel: [46850.257687] md/raid1:md2: redirecting >>> sector 20367040 to other mirror: dm-13 >>> May 16 11:12:15 kernel: [46850.268731] md/raid1:md2: redirecting >>> sector 17613504 to other mirror: dm-13 >>> May 16 11:12:15 kernel: [46850.398242] md/raid1:md2: redirecting >>> sector 18042720 to other mirror: dm-13 >>> May 16 11:12:23 kernel: [46858.448465] md: unbind<dm-6> >>> May 16 11:12:23 kernel: [46858.456081] md: export_rdev(dm-6) >>> May 16 11:23:19 kernel: [47515.062547] md: bind<dm-6> >>> >>> May 16 11:24:28 kernel: [47583.920126] INFO: task md2_raid1:15749 >>> blocked for more than 60 seconds. >>> May 16 11:24:28 kernel: [47583.921829] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> May 16 11:24:28 kernel: [47583.923361] md2_raid1 D >>> 0000000000000001 0 15749 2 0x00000000 >>> May 16 11:24:28 kernel: [47583.923367] ffff880097c23c28 >>> 0000000000000046 ffff880000000002 00000000982c43b8 >>> May 16 11:24:28 kernel: [47583.923372] ffff880097c23fd8 >>> ffff880097c23fd8 ffff880097c23fd8 0000000000013f40 >>> May 16 11:24:28 kernel: [47583.923376] ffff880119b11740 >>> ffff8800a5489740 ffff880097c23c38 ffff88011609d3c0 >>> May 16 11:24:28 kernel: [47583.923381] Call Trace: >>> May 16 11:24:28 kernel: [47583.923395] [<ffffffff816eb399>] schedule+0x29/0x70 >>> May 16 11:24:28 kernel: [47583.923402] [<ffffffffa0516736>] >>> raise_barrier+0x106/0x160 [raid1] >>> May 16 11:24:28 kernel: [47583.923410] [<ffffffff8107fcc0>] ? >>> add_wait_queue+0x60/0x60 >>> May 16 11:24:28 kernel: [47583.923415] [<ffffffffa0516af7>] >>> raid1_add_disk+0x197/0x200 [raid1] >>> May 16 11:24:28 kernel: [47583.923421] [<ffffffff81567fa4>] >>> remove_and_add_spares+0x104/0x220 >>> May 16 11:24:28 kernel: [47583.923426] [<ffffffff8156a02d>] >>> md_check_recovery.part.49+0x40d/0x530 >>> May 16 11:24:28 kernel: [47583.923429] [<ffffffff8156a165>] >>> md_check_recovery+0x15/0x20 >>> May 16 11:24:28 kernel: [47583.923433] [<ffffffffa0517e42>] >>> raid1d+0x22/0x180 [raid1] >>> May 16 11:24:28 kernel: [47583.923439] [<ffffffff81045cd9>] ? >>> default_spin_lock_flags+0x9/0x10 >>> May 16 11:24:28 kernel: [47583.923443] [<ffffffff81045cd9>] ? >>> default_spin_lock_flags+0x9/0x10 >>> May 16 11:24:28 kernel: [47583.923449] [<ffffffff815624ed>] >>> md_thread+0x10d/0x140 >>> May 16 11:24:28 kernel: [47583.923453] [<ffffffff8107fcc0>] ? >>> add_wait_queue+0x60/0x60 >>> May 16 11:24:28 kernel: [47583.923457] [<ffffffff815623e0>] ? >>> md_rdev_init+0x140/0x140 >>> May 16 11:24:28 kernel: [47583.923461] [<ffffffff8107f0d0>] kthread+0xc0/0xd0 >>> May 16 11:24:28 kernel: [47583.923465] [<ffffffff8107f010>] ? >>> flush_kthread_worker+0xb0/0xb0 >>> May 16 11:24:28 kernel: [47583.923472] [<ffffffff816f506c>] >>> ret_from_fork+0x7c/0xb0 >>> May 16 11:24:28 kernel: [47583.923476] [<ffffffff8107f010>] ? >>> flush_kthread_worker+0xb0/0xb0 >>> >>> dm-13 is the good drive, dm-6 is the one that failed. >>> >>> At this point, we have several threads calling submit_bio and all >>> stuck like this: >>> >>> cat /proc/16218/stack >>> [<ffffffffa0516d6e>] wait_barrier+0xbe/0x160 [raid1] >>> [<ffffffffa0518627>] make_request+0x87/0xa90 [raid1] >>> [<ffffffff81561ed0>] md_make_request+0xd0/0x200 >>> [<ffffffff8132bcaa>] generic_make_request+0xca/0x100 >>> [<ffffffff8132bd5b>] submit_bio+0x7b/0x160 >>> ... >>> >>> And md raid1 thread stuck like this: >>> >>> cat /proc/15749/stack >>> [<ffffffffa0516736>] raise_barrier+0x106/0x160 [raid1] >>> [<ffffffffa0516af7>] raid1_add_disk+0x197/0x200 [raid1] >>> [<ffffffff81567fa4>] remove_and_add_spares+0x104/0x220 >>> [<ffffffff8156a02d>] md_check_recovery.part.49+0x40d/0x530 >>> [<ffffffff8156a165>] md_check_recovery+0x15/0x20 >>> [<ffffffffa0517e42>] raid1d+0x22/0x180 [raid1] >>> [<ffffffff815624ed>] md_thread+0x10d/0x140 >>> [<ffffffff8107f0d0>] kthread+0xc0/0xd0 >>> [<ffffffff816f506c>] ret_from_fork+0x7c/0xb0 >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> >>> We have also two user-space threads stuck: >>> >>> one is trying to read /sys/block/md2/md/array_state and its kernel stack is: >>> # cat /proc/2251/stack >>> [<ffffffff81564602>] md_attr_show+0x72/0xf0 >>> [<ffffffff8120f116>] fill_read_buffer.isra.8+0x66/0xf0 >>> [<ffffffff8120f244>] sysfs_read_file+0xa4/0xc0 >>> [<ffffffff8119b0d0>] vfs_read+0xb0/0x180 >>> [<ffffffff8119b1f2>] sys_read+0x52/0xa0 >>> [<ffffffff816f511d>] system_call_fastpath+0x1a/0x1f >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> >>> the other wants to read from /proc/mdstat and is: >>> [<ffffffff81563d2b>] md_seq_show+0x4b/0x540 >>> [<ffffffff811bdd1b>] seq_read+0x16b/0x400 >>> [<ffffffff811ff572>] proc_reg_read+0x82/0xc0 >>> [<ffffffff8119b0d0>] vfs_read+0xb0/0x180 >>> [<ffffffff8119b1f2>] sys_read+0x52/0xa0 >>> [<ffffffff816f511d>] system_call_fastpath+0x1a/0x1f >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> >>> mdadm --detail also gets stuck if attempted, in stack like this: >>> cat /proc/2864/stack >>> [<ffffffff81564602>] md_attr_show+0x72/0xf0 >>> [<ffffffff8120f116>] fill_read_buffer.isra.8+0x66/0xf0 >>> [<ffffffff8120f244>] sysfs_read_file+0xa4/0xc0 >>> [<ffffffff8119b0d0>] vfs_read+0xb0/0x180 >>> [<ffffffff8119b1f2>] sys_read+0x52/0xa0 >>> [<ffffffff816f511d>] system_call_fastpath+0x1a/0x1f >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> >>> Might your patch https://patchwork.kernel.org/patch/2260051/ fix this >> >> Probably. >> >>> issue? Is this patch alone applicable to kernel 3.8.2? >> >> Probably. > > This problem still happens in 3.8.13, which contains the above fix of > yours. I am currently debugging it deeper, but still don't know what > the issue is. > The problem happens when raid1_add_disk() calls raise_barrier(). > conf->nr_waiting is 0 at this point. So then raise_barrier() starts > waiting for conf->nr_pending getting to 0, but it never gets to 0. > From this point, anybody that calls wait_barrier() is also stuck. > > This looks like a real deadlock in a non-ancient kernel. I realize > that 3.8.x is EOL, but still it's only one stable behind. > Can you help further with this problem? > > Another problem that I see often, is when a drive fails in RAID1, and > then rdev is marked as "Blocked", but then Blocked bit never gets > cleared for some reason. So this drive is not ejected from the array. > Again, I was unable to debug it deeper at this point. > > Thanks, > Alex. > > > > > > > > > > > > >> >>> Can you pls kindly comment on this. >>> >> >> NeilBrown
[ 514.714425] md: bind<dm-5> [ 514.718751] md2: raise_barrier(1): conf->barrier=0 conf->nr_waiting=0 conf->nr_pending=191 conf->pending_count=0 [ 514.718755] md2: raise_barrier(2): conf->barrier=1 conf->nr_waiting=0 conf->nr_pending=191 conf->pending_count=0 [ 514.720564] md2: wait_barrier(PID=7079,1) conf->barrier=1 conf->nr_waiting=1 conf->nr_pending=191 conf->pending_count=0 [ 514.722073] md2: wait_barrier(PID=7069,1) conf->barrier=1 conf->nr_waiting=2 conf->nr_pending=191 conf->pending_count=0 [ 514.724602] md2: wait_barrier(PID=9922,1) conf->barrier=1 conf->nr_waiting=3 conf->nr_pending=191 conf->pending_count=0 [ 514.725575] md2: wait_barrier(PID=9909,1) conf->barrier=1 conf->nr_waiting=4 conf->nr_pending=187 conf->pending_count=0 [ 514.725661] md2: wait_barrier(PID=7082,1) conf->barrier=1 conf->nr_waiting=5 conf->nr_pending=187 conf->pending_count=0 [ 514.725836] md2: wait_barrier(PID=9910,1) conf->barrier=1 conf->nr_waiting=6 conf->nr_pending=178 conf->pending_count=0 [ 514.725848] md2: wait_barrier(PID=9918,1) conf->barrier=1 conf->nr_waiting=7 conf->nr_pending=178 conf->pending_count=0 [ 514.725860] md2: wait_barrier(PID=7077,1) conf->barrier=1 conf->nr_waiting=8 conf->nr_pending=178 conf->pending_count=0 [ 514.725872] md2: wait_barrier(PID=9905,1) conf->barrier=1 conf->nr_waiting=9 conf->nr_pending=178 conf->pending_count=0 [ 514.725884] md2: wait_barrier(PID=7147,1) conf->barrier=1 conf->nr_waiting=10 conf->nr_pending=178 conf->pending_count=0 [ 514.725917] md2: wait_barrier(PID=10217,1) conf->barrier=1 conf->nr_waiting=11 conf->nr_pending=178 conf->pending_count=0 [ 514.727122] md2: wait_barrier(PID=7177,1) conf->barrier=1 conf->nr_waiting=12 conf->nr_pending=155 conf->pending_count=0 [ 514.727446] md2: wait_barrier(PID=9904,1) conf->barrier=1 conf->nr_waiting=13 conf->nr_pending=155 conf->pending_count=0 [ 514.727475] md2: wait_barrier(PID=9894,1) conf->barrier=1 conf->nr_waiting=14 conf->nr_pending=155 conf->pending_count=0 [ 514.727489] md2: wait_barrier(PID=9898,1) conf->barrier=1 conf->nr_waiting=15 conf->nr_pending=155 conf->pending_count=0 [ 514.727507] md2: wait_barrier(PID=7072,1) conf->barrier=1 conf->nr_waiting=16 conf->nr_pending=155 conf->pending_count=0 [ 514.727651] md2: wait_barrier(PID=9899,1) conf->barrier=1 conf->nr_waiting=17 conf->nr_pending=150 conf->pending_count=0 [ 514.727853] md2: wait_barrier(PID=9906,1) conf->barrier=1 conf->nr_waiting=18 conf->nr_pending=150 conf->pending_count=0 [ 514.727867] md2: wait_barrier(PID=9912,1) conf->barrier=1 conf->nr_waiting=19 conf->nr_pending=150 conf->pending_count=0 [ 514.727915] md2: wait_barrier(PID=6888,1) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=150 conf->pending_count=0 [ 514.728109] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=150 conf->pending_count=0 [ 514.728112] md2: wait_barrier(PID=7080,2) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=151 conf->pending_count=0 [ 514.728136] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=151 conf->pending_count=1 [ 514.728138] md2: wait_barrier(PID=7080,2) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=152 conf->pending_count=1 [ 514.728143] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=152 conf->pending_count=2 [ 514.728145] md2: wait_barrier(PID=7080,2) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=153 conf->pending_count=2 [ 514.728149] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=153 conf->pending_count=3 [ 514.730110] md2: wait_barrier(PID=9919,1) conf->barrier=1 conf->nr_waiting=22 conf->nr_pending=153 conf->pending_count=3 [ 514.730188] md2: wait_barrier(PID=5773,1) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=153 conf->pending_count=3 [ 514.731139] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=153 conf->pending_count=3 [ 514.731143] md2: wait_barrier(PID=7174,2) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=154 conf->pending_count=3 [ 514.731176] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=154 conf->pending_count=4 [ 514.731179] md2: wait_barrier(PID=7174,2) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=155 conf->pending_count=4 [ 514.731183] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=155 conf->pending_count=5 [ 514.731186] md2: wait_barrier(PID=7174,2) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=156 conf->pending_count=5 [ 514.731193] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=156 conf->pending_count=6 [ 514.731371] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=156 conf->pending_count=6 [ 514.731374] md2: wait_barrier(PID=7070,2) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=157 conf->pending_count=6 [ 514.731382] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=157 conf->pending_count=7 [ 514.731385] md2: wait_barrier(PID=7070,2) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=158 conf->pending_count=7 [ 514.731388] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=158 conf->pending_count=8 [ 514.731390] md2: wait_barrier(PID=7070,2) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=159 conf->pending_count=8 [ 514.731394] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=159 conf->pending_count=9 [ 514.731491] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=159 conf->pending_count=9 [ 514.731494] md2: wait_barrier(PID=9888,2) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=160 conf->pending_count=9 [ 514.731502] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=160 conf->pending_count=10 [ 514.731505] md2: wait_barrier(PID=9888,2) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=161 conf->pending_count=10 [ 514.731508] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=161 conf->pending_count=11 [ 514.731510] md2: wait_barrier(PID=9888,2) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=162 conf->pending_count=11 [ 514.731514] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=162 conf->pending_count=12 [ 514.731755] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=162 conf->pending_count=12 [ 514.731758] md2: wait_barrier(PID=9895,2) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=163 conf->pending_count=12 [ 514.731767] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=163 conf->pending_count=13 [ 514.731769] md2: wait_barrier(PID=9895,2) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=164 conf->pending_count=13 [ 514.731773] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=164 conf->pending_count=14 [ 514.731775] md2: wait_barrier(PID=9895,2) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=165 conf->pending_count=14 [ 514.731826] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=165 conf->pending_count=15 [ 514.731875] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=165 conf->pending_count=15 [ 514.731878] md2: wait_barrier(PID=9897,2) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=166 conf->pending_count=15 [ 514.731884] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=166 conf->pending_count=16 [ 514.731887] md2: wait_barrier(PID=9897,2) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=167 conf->pending_count=16 [ 514.731890] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=167 conf->pending_count=17 [ 514.731893] md2: wait_barrier(PID=9897,2) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=168 conf->pending_count=17 [ 514.731902] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=168 conf->pending_count=18 [ 514.731971] md2: wait_barrier(PID=9896,1) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=168 conf->pending_count=18 [ 514.732304] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=168 conf->pending_count=18 [ 514.732307] md2: wait_barrier(PID=9908,2) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=169 conf->pending_count=18 [ 514.732340] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=169 conf->pending_count=19 [ 514.732343] md2: wait_barrier(PID=9908,2) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=170 conf->pending_count=19 [ 514.732346] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=170 conf->pending_count=20 [ 514.732349] md2: wait_barrier(PID=9908,2) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=171 conf->pending_count=20 [ 514.732352] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=171 conf->pending_count=21 [ 514.732397] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=171 conf->pending_count=21 [ 514.732400] md2: wait_barrier(PID=5774,2) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=172 conf->pending_count=21 [ 514.732412] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=172 conf->pending_count=22 [ 514.732414] md2: wait_barrier(PID=5774,2) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=173 conf->pending_count=22 [ 514.732419] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=173 conf->pending_count=23 [ 514.732421] md2: wait_barrier(PID=5774,2) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=174 conf->pending_count=23 [ 514.732424] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=174 conf->pending_count=24 [ 514.732459] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=174 conf->pending_count=24 [ 514.732461] md2: wait_barrier(PID=7068,2) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=175 conf->pending_count=24 [ 514.732467] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=175 conf->pending_count=25 [ 514.732469] md2: wait_barrier(PID=7068,2) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=176 conf->pending_count=25 [ 514.732473] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=176 conf->pending_count=26 [ 514.732475] md2: wait_barrier(PID=7068,2) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=177 conf->pending_count=26 [ 514.732478] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=177 conf->pending_count=27 [ 514.734807] md2: wait_barrier(PID=9901,1) conf->barrier=1 conf->nr_waiting=33 conf->nr_pending=177 conf->pending_count=27 [ 514.734830] md2: wait_barrier(PID=9923,1) conf->barrier=1 conf->nr_waiting=34 conf->nr_pending=177 conf->pending_count=27 [ 514.735975] md2: wait_barrier(PID=7175,1) conf->barrier=1 conf->nr_waiting=35 conf->nr_pending=158 conf->pending_count=27 [ 514.735992] md2: wait_barrier(PID=7081,1) conf->barrier=1 conf->nr_waiting=36 conf->nr_pending=158 conf->pending_count=27 [ 514.736083] md2: wait_barrier(PID=9921,1) conf->barrier=1 conf->nr_waiting=37 conf->nr_pending=158 conf->pending_count=27 [ 514.736670] md2: wait_barrier(PID=7075,1) conf->barrier=1 conf->nr_waiting=38 conf->nr_pending=158 conf->pending_count=27 [ 514.740425] md2: wait_barrier(PID=9925,1) conf->barrier=1 conf->nr_waiting=39 conf->nr_pending=125 conf->pending_count=27 [ 514.740448] md2: wait_barrier(PID=41,1) conf->barrier=1 conf->nr_waiting=40 conf->nr_pending=125 conf->pending_count=27 [ 514.740465] md2: wait_barrier(PID=7083,1) conf->barrier=1 conf->nr_waiting=41 conf->nr_pending=125 conf->pending_count=27 [ 514.740478] md2: wait_barrier(PID=9911,1) conf->barrier=1 conf->nr_waiting=42 conf->nr_pending=125 conf->pending_count=27 [ 514.740492] md2: wait_barrier(PID=9903,1) conf->barrier=1 conf->nr_waiting=43 conf->nr_pending=125 conf->pending_count=27 [ 514.741067] md2: wait_barrier(PID=7071,1) conf->barrier=1 conf->nr_waiting=44 conf->nr_pending=125 conf->pending_count=27 [ 514.743463] md2: wait_barrier(PID=9920,1) conf->barrier=1 conf->nr_waiting=45 conf->nr_pending=105 conf->pending_count=27 [ 514.743478] md2: wait_barrier(PID=9924,1) conf->barrier=1 conf->nr_waiting=46 conf->nr_pending=105 conf->pending_count=27 [ 514.743489] md2: wait_barrier(PID=7146,1) conf->barrier=1 conf->nr_waiting=47 conf->nr_pending=105 conf->pending_count=27 [ 514.745765] md2: wait_barrier(PID=9890,1) conf->barrier=1 conf->nr_waiting=48 conf->nr_pending=89 conf->pending_count=27 [ 514.745784] md2: wait_barrier(PID=9879,1) conf->barrier=1 conf->nr_waiting=49 conf->nr_pending=89 conf->pending_count=27 [ 514.746556] md2: wait_barrier(PID=6,1) conf->barrier=1 conf->nr_waiting=50 conf->nr_pending=81 conf->pending_count=27 [ 514.748897] md2: wait_barrier(PID=9966,1) conf->barrier=1 conf->nr_waiting=51 conf->nr_pending=75 conf->pending_count=27 [ 514.749195] md2: wait_barrier(PID=9900,1) conf->barrier=1 conf->nr_waiting=52 conf->nr_pending=67 conf->pending_count=27 [ 514.750047] md2: wait_barrier(PID=9887,1) conf->barrier=1 conf->nr_waiting=53 conf->nr_pending=57 conf->pending_count=27 [ 514.750065] md2: wait_barrier(PID=9889,1) conf->barrier=1 conf->nr_waiting=54 conf->nr_pending=57 conf->pending_count=27 [ 514.750076] md2: wait_barrier(PID=9893,1) conf->barrier=1 conf->nr_waiting=55 conf->nr_pending=57 conf->pending_count=27 [ 514.752565] md2: wait_barrier(PID=9915,1) conf->barrier=1 conf->nr_waiting=56 conf->nr_pending=48 conf->pending_count=27 [ 514.752580] md2: wait_barrier(PID=7074,1) conf->barrier=1 conf->nr_waiting=57 conf->nr_pending=48 conf->pending_count=27 [ 514.752592] md2: wait_barrier(PID=9902,1) conf->barrier=1 conf->nr_waiting=58 conf->nr_pending=48 conf->pending_count=27 [ 514.752603] md2: wait_barrier(PID=9916,1) conf->barrier=1 conf->nr_waiting=59 conf->nr_pending=48 conf->pending_count=27 [ 514.752926] md2: wait_barrier(PID=42,1) conf->barrier=1 conf->nr_waiting=60 conf->nr_pending=27 conf->pending_count=27 [ 514.752946] md2: wait_barrier(PID=7073,1) conf->barrier=1 conf->nr_waiting=61 conf->nr_pending=27 conf->pending_count=27 [ 514.752960] md2: wait_barrier(PID=9914,1) conf->barrier=1 conf->nr_waiting=62 conf->nr_pending=27 conf->pending_count=27
static void raise_barrier(struct r1conf *conf) { spin_lock_irq(&conf->resync_lock); pr_info("%s: raise_barrier(1): conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count); /* Wait until no block IO is waiting */ wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting, conf->resync_lock); /* block any new IO from starting */ conf->barrier++; pr_info("%s: raise_barrier(2): conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count); /* Now wait for all pending IO to complete */ wait_event_lock_irq(conf->wait_barrier, !conf->nr_pending && conf->barrier < RESYNC_DEPTH, conf->resync_lock); pr_info("%s: raise_barrier(3): conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count); spin_unlock_irq(&conf->resync_lock); } static void wait_barrier(struct r1conf *conf) { int waited = 0; spin_lock_irq(&conf->resync_lock); if (conf->barrier) { conf->nr_waiting++; waited = 1; pr_info("%s: wait_barrier(PID=%d,1) conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), current->pid, conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count); if (conf->nr_pending && current->bio_list && !bio_list_empty(current->bio_list)) dump_stack(); /* Wait for the barrier to drop. * However if there are already pending * requests (preventing the barrier from * rising completely), and the * pre-process bio queue isn't empty, * then don't wait, as we need to empty * that queue to get the nr_pending * count down. */ wait_event_lock_irq(conf->wait_barrier, !conf->barrier || (conf->nr_pending && current->bio_list && !bio_list_empty(current->bio_list)), conf->resync_lock); conf->nr_waiting--; } conf->nr_pending++; if (waited) pr_info("%s: wait_barrier(PID=%d,2) conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), current->pid, conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count); spin_unlock_irq(&conf->resync_lock); }