Re: BUG - raid 1 deadlock on handle_read_error / wait_barrier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Neil,
I believe I have found what is causing the deadlock. It happens in two flavors:

1)
# raid1d() is called, and conf->pending_bio_list is non-empty at this point
# raid1d() calls md_check_recovery(), which eventually calls
raid1_add_disk(), which calls raise_barrier()
# Now raise_barrier will wait for conf->nr_pending to become 0, but it
cannot become 0, because there are bios sitting in
conf->pending_bio_list, which nobody will flush, because raid1d is the
one supposed to call flush_pending_writes(), either directly or via
handle_read_error. But it is stuck in raise_barrier.

2)
# raid1_add_disk() calls raise_barrier(), and waits for
conf->nr_pending to become 0, as before
# new WRITE comes and calls wait_barrier(), but this thread has a
non-empty current->bio_list
# In this case, the code allows the WRITE to go through
wait_barrier(), and trigger WRITEs to mirror legs, but these WRITEs
again end up in conf->pending_bio_list (either via raid1_unplug or
directly). But nobody will flush conf->pending_bio_list, because
raid1d is stuck in raise_barrier.

Previously, for example in kernel 3.2, raid1_add_disk did not call
raise_barrier, so this problem did not happen.

Attached is a reproduction with some prints that I added to
raise_barrier and wait_barrier (their code also attached). It
demonstrates case 2. It shows that once raise_barrier got called,
conf->nr_pending drops down, until it equals the number of
wait_barrier calls, that slipped through because of non-empty
current->bio_list. And at this point, this array hangs.

Can you please comment on how to fix this problem. It looks like a
real deadlock.
We can perhaps call md_check_recovery() after flush_pending_writes(),
but this only makes the window smaller, not closes it entirely. But it
looks like we really should not be calling raise_barrier from raid1d.

Thanks,
Alex.


On Thu, May 30, 2013 at 5:30 PM, Alexander Lyakas
<alex.bolshoy@xxxxxxxxx> wrote:
> Hi Neil,
>
> On Mon, May 20, 2013 at 10:17 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> On Thu, 16 May 2013 17:07:04 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
>> wrote:
>>
>>> Hello Neil,
>>> we are hitting issue that looks very similar; we are on kernel 3.8.2.
>>> The array is a 2-device raid1, which experienced a drive failure, but
>>> then drive was removed and re-added back to the array (although
>>> rebuild never started). Relevant parts of the kernel log:
>>>
>>> May 16 11:12:14  kernel: [46850.090499] md/raid1:md2: Disk failure on
>>> dm-6, disabling device.
>>> May 16 11:12:14  kernel: [46850.090499] md/raid1:md2: Operation
>>> continuing on 1 devices.
>>> May 16 11:12:14  kernel: [46850.090511] md: super_written gets
>>> error=-5, uptodate=0
>>> May 16 11:12:14  kernel: [46850.090676] md/raid1:md2: dm-6:
>>> rescheduling sector 18040736
>>> May 16 11:12:14  kernel: [46850.090764] md/raid1:md2: dm-6:
>>> rescheduling sector 20367040
>>> May 16 11:12:14  kernel: [46850.090826] md/raid1:md2: dm-6:
>>> rescheduling sector 17613504
>>> May 16 11:12:14  kernel: [46850.090883] md/raid1:md2: dm-6:
>>> rescheduling sector 18042720
>>> May 16 11:12:15  kernel: [46850.229970] md/raid1:md2: redirecting
>>> sector 18040736 to other mirror: dm-13
>>> May 16 11:12:15  kernel: [46850.257687] md/raid1:md2: redirecting
>>> sector 20367040 to other mirror: dm-13
>>> May 16 11:12:15  kernel: [46850.268731] md/raid1:md2: redirecting
>>> sector 17613504 to other mirror: dm-13
>>> May 16 11:12:15  kernel: [46850.398242] md/raid1:md2: redirecting
>>> sector 18042720 to other mirror: dm-13
>>> May 16 11:12:23  kernel: [46858.448465] md: unbind<dm-6>
>>> May 16 11:12:23  kernel: [46858.456081] md: export_rdev(dm-6)
>>> May 16 11:23:19  kernel: [47515.062547] md: bind<dm-6>
>>>
>>> May 16 11:24:28  kernel: [47583.920126] INFO: task md2_raid1:15749
>>> blocked for more than 60 seconds.
>>> May 16 11:24:28  kernel: [47583.921829] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> May 16 11:24:28  kernel: [47583.923361] md2_raid1       D
>>> 0000000000000001     0 15749      2 0x00000000
>>> May 16 11:24:28  kernel: [47583.923367]  ffff880097c23c28
>>> 0000000000000046 ffff880000000002 00000000982c43b8
>>> May 16 11:24:28  kernel: [47583.923372]  ffff880097c23fd8
>>> ffff880097c23fd8 ffff880097c23fd8 0000000000013f40
>>> May 16 11:24:28  kernel: [47583.923376]  ffff880119b11740
>>> ffff8800a5489740 ffff880097c23c38 ffff88011609d3c0
>>> May 16 11:24:28  kernel: [47583.923381] Call Trace:
>>> May 16 11:24:28  kernel: [47583.923395]  [<ffffffff816eb399>] schedule+0x29/0x70
>>> May 16 11:24:28  kernel: [47583.923402]  [<ffffffffa0516736>]
>>> raise_barrier+0x106/0x160 [raid1]
>>> May 16 11:24:28  kernel: [47583.923410]  [<ffffffff8107fcc0>] ?
>>> add_wait_queue+0x60/0x60
>>> May 16 11:24:28  kernel: [47583.923415]  [<ffffffffa0516af7>]
>>> raid1_add_disk+0x197/0x200 [raid1]
>>> May 16 11:24:28  kernel: [47583.923421]  [<ffffffff81567fa4>]
>>> remove_and_add_spares+0x104/0x220
>>> May 16 11:24:28  kernel: [47583.923426]  [<ffffffff8156a02d>]
>>> md_check_recovery.part.49+0x40d/0x530
>>> May 16 11:24:28  kernel: [47583.923429]  [<ffffffff8156a165>]
>>> md_check_recovery+0x15/0x20
>>> May 16 11:24:28  kernel: [47583.923433]  [<ffffffffa0517e42>]
>>> raid1d+0x22/0x180 [raid1]
>>> May 16 11:24:28  kernel: [47583.923439]  [<ffffffff81045cd9>] ?
>>> default_spin_lock_flags+0x9/0x10
>>> May 16 11:24:28  kernel: [47583.923443]  [<ffffffff81045cd9>] ?
>>> default_spin_lock_flags+0x9/0x10
>>> May 16 11:24:28  kernel: [47583.923449]  [<ffffffff815624ed>]
>>> md_thread+0x10d/0x140
>>> May 16 11:24:28  kernel: [47583.923453]  [<ffffffff8107fcc0>] ?
>>> add_wait_queue+0x60/0x60
>>> May 16 11:24:28  kernel: [47583.923457]  [<ffffffff815623e0>] ?
>>> md_rdev_init+0x140/0x140
>>> May 16 11:24:28  kernel: [47583.923461]  [<ffffffff8107f0d0>] kthread+0xc0/0xd0
>>> May 16 11:24:28  kernel: [47583.923465]  [<ffffffff8107f010>] ?
>>> flush_kthread_worker+0xb0/0xb0
>>> May 16 11:24:28  kernel: [47583.923472]  [<ffffffff816f506c>]
>>> ret_from_fork+0x7c/0xb0
>>> May 16 11:24:28  kernel: [47583.923476]  [<ffffffff8107f010>] ?
>>> flush_kthread_worker+0xb0/0xb0
>>>
>>> dm-13 is the good drive, dm-6 is the one that failed.
>>>
>>> At this point, we have several threads calling submit_bio and all
>>> stuck like this:
>>>
>>> cat /proc/16218/stack
>>> [<ffffffffa0516d6e>] wait_barrier+0xbe/0x160 [raid1]
>>> [<ffffffffa0518627>] make_request+0x87/0xa90 [raid1]
>>> [<ffffffff81561ed0>] md_make_request+0xd0/0x200
>>> [<ffffffff8132bcaa>] generic_make_request+0xca/0x100
>>> [<ffffffff8132bd5b>] submit_bio+0x7b/0x160
>>> ...
>>>
>>> And md raid1 thread stuck like this:
>>>
>>> cat /proc/15749/stack
>>> [<ffffffffa0516736>] raise_barrier+0x106/0x160 [raid1]
>>> [<ffffffffa0516af7>] raid1_add_disk+0x197/0x200 [raid1]
>>> [<ffffffff81567fa4>] remove_and_add_spares+0x104/0x220
>>> [<ffffffff8156a02d>] md_check_recovery.part.49+0x40d/0x530
>>> [<ffffffff8156a165>] md_check_recovery+0x15/0x20
>>> [<ffffffffa0517e42>] raid1d+0x22/0x180 [raid1]
>>> [<ffffffff815624ed>] md_thread+0x10d/0x140
>>> [<ffffffff8107f0d0>] kthread+0xc0/0xd0
>>> [<ffffffff816f506c>] ret_from_fork+0x7c/0xb0
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> We have also two user-space threads stuck:
>>>
>>> one is trying to read /sys/block/md2/md/array_state and its kernel stack is:
>>> # cat /proc/2251/stack
>>> [<ffffffff81564602>] md_attr_show+0x72/0xf0
>>> [<ffffffff8120f116>] fill_read_buffer.isra.8+0x66/0xf0
>>> [<ffffffff8120f244>] sysfs_read_file+0xa4/0xc0
>>> [<ffffffff8119b0d0>] vfs_read+0xb0/0x180
>>> [<ffffffff8119b1f2>] sys_read+0x52/0xa0
>>> [<ffffffff816f511d>] system_call_fastpath+0x1a/0x1f
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> the other wants to read from /proc/mdstat and is:
>>> [<ffffffff81563d2b>] md_seq_show+0x4b/0x540
>>> [<ffffffff811bdd1b>] seq_read+0x16b/0x400
>>> [<ffffffff811ff572>] proc_reg_read+0x82/0xc0
>>> [<ffffffff8119b0d0>] vfs_read+0xb0/0x180
>>> [<ffffffff8119b1f2>] sys_read+0x52/0xa0
>>> [<ffffffff816f511d>] system_call_fastpath+0x1a/0x1f
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> mdadm --detail also gets stuck if attempted, in stack like this:
>>> cat /proc/2864/stack
>>> [<ffffffff81564602>] md_attr_show+0x72/0xf0
>>> [<ffffffff8120f116>] fill_read_buffer.isra.8+0x66/0xf0
>>> [<ffffffff8120f244>] sysfs_read_file+0xa4/0xc0
>>> [<ffffffff8119b0d0>] vfs_read+0xb0/0x180
>>> [<ffffffff8119b1f2>] sys_read+0x52/0xa0
>>> [<ffffffff816f511d>] system_call_fastpath+0x1a/0x1f
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> Might your patch https://patchwork.kernel.org/patch/2260051/ fix this
>>
>> Probably.
>>
>>> issue? Is this patch alone applicable to kernel 3.8.2?
>>
>> Probably.
>
> This problem still happens in 3.8.13, which contains the above fix of
> yours. I am currently debugging it deeper, but still don't know what
> the issue is.
> The problem happens when raid1_add_disk() calls raise_barrier().
> conf->nr_waiting is 0 at this point. So then raise_barrier() starts
> waiting for conf->nr_pending getting to 0, but it never gets to 0.
> From this point, anybody that calls wait_barrier() is also stuck.
>
> This looks like a real deadlock in a non-ancient kernel. I realize
> that 3.8.x is EOL, but still it's only one stable behind.
> Can you help further with this problem?
>
> Another problem that I see often, is when a drive fails in RAID1, and
> then rdev is marked as "Blocked", but then Blocked bit never gets
> cleared for some reason. So this drive is not ejected from the array.
> Again, I was unable to debug it deeper at this point.
>
> Thanks,
> Alex.
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>> Can you pls kindly comment on this.
>>>
>>
>> NeilBrown
[  514.714425] md: bind<dm-5>
[  514.718751] md2: raise_barrier(1): conf->barrier=0 conf->nr_waiting=0 conf->nr_pending=191 conf->pending_count=0
[  514.718755] md2: raise_barrier(2): conf->barrier=1 conf->nr_waiting=0 conf->nr_pending=191 conf->pending_count=0
[  514.720564] md2: wait_barrier(PID=7079,1) conf->barrier=1 conf->nr_waiting=1 conf->nr_pending=191 conf->pending_count=0
[  514.722073] md2: wait_barrier(PID=7069,1) conf->barrier=1 conf->nr_waiting=2 conf->nr_pending=191 conf->pending_count=0
[  514.724602] md2: wait_barrier(PID=9922,1) conf->barrier=1 conf->nr_waiting=3 conf->nr_pending=191 conf->pending_count=0
[  514.725575] md2: wait_barrier(PID=9909,1) conf->barrier=1 conf->nr_waiting=4 conf->nr_pending=187 conf->pending_count=0
[  514.725661] md2: wait_barrier(PID=7082,1) conf->barrier=1 conf->nr_waiting=5 conf->nr_pending=187 conf->pending_count=0
[  514.725836] md2: wait_barrier(PID=9910,1) conf->barrier=1 conf->nr_waiting=6 conf->nr_pending=178 conf->pending_count=0
[  514.725848] md2: wait_barrier(PID=9918,1) conf->barrier=1 conf->nr_waiting=7 conf->nr_pending=178 conf->pending_count=0
[  514.725860] md2: wait_barrier(PID=7077,1) conf->barrier=1 conf->nr_waiting=8 conf->nr_pending=178 conf->pending_count=0
[  514.725872] md2: wait_barrier(PID=9905,1) conf->barrier=1 conf->nr_waiting=9 conf->nr_pending=178 conf->pending_count=0
[  514.725884] md2: wait_barrier(PID=7147,1) conf->barrier=1 conf->nr_waiting=10 conf->nr_pending=178 conf->pending_count=0
[  514.725917] md2: wait_barrier(PID=10217,1) conf->barrier=1 conf->nr_waiting=11 conf->nr_pending=178 conf->pending_count=0
[  514.727122] md2: wait_barrier(PID=7177,1) conf->barrier=1 conf->nr_waiting=12 conf->nr_pending=155 conf->pending_count=0
[  514.727446] md2: wait_barrier(PID=9904,1) conf->barrier=1 conf->nr_waiting=13 conf->nr_pending=155 conf->pending_count=0
[  514.727475] md2: wait_barrier(PID=9894,1) conf->barrier=1 conf->nr_waiting=14 conf->nr_pending=155 conf->pending_count=0
[  514.727489] md2: wait_barrier(PID=9898,1) conf->barrier=1 conf->nr_waiting=15 conf->nr_pending=155 conf->pending_count=0
[  514.727507] md2: wait_barrier(PID=7072,1) conf->barrier=1 conf->nr_waiting=16 conf->nr_pending=155 conf->pending_count=0
[  514.727651] md2: wait_barrier(PID=9899,1) conf->barrier=1 conf->nr_waiting=17 conf->nr_pending=150 conf->pending_count=0
[  514.727853] md2: wait_barrier(PID=9906,1) conf->barrier=1 conf->nr_waiting=18 conf->nr_pending=150 conf->pending_count=0
[  514.727867] md2: wait_barrier(PID=9912,1) conf->barrier=1 conf->nr_waiting=19 conf->nr_pending=150 conf->pending_count=0
[  514.727915] md2: wait_barrier(PID=6888,1) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=150 conf->pending_count=0
[  514.728109] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=150 conf->pending_count=0
[  514.728112] md2: wait_barrier(PID=7080,2) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=151 conf->pending_count=0
[  514.728136] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=151 conf->pending_count=1
[  514.728138] md2: wait_barrier(PID=7080,2) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=152 conf->pending_count=1
[  514.728143] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=152 conf->pending_count=2
[  514.728145] md2: wait_barrier(PID=7080,2) conf->barrier=1 conf->nr_waiting=20 conf->nr_pending=153 conf->pending_count=2
[  514.728149] md2: wait_barrier(PID=7080,1) conf->barrier=1 conf->nr_waiting=21 conf->nr_pending=153 conf->pending_count=3
[  514.730110] md2: wait_barrier(PID=9919,1) conf->barrier=1 conf->nr_waiting=22 conf->nr_pending=153 conf->pending_count=3
[  514.730188] md2: wait_barrier(PID=5773,1) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=153 conf->pending_count=3
[  514.731139] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=153 conf->pending_count=3
[  514.731143] md2: wait_barrier(PID=7174,2) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=154 conf->pending_count=3
[  514.731176] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=154 conf->pending_count=4
[  514.731179] md2: wait_barrier(PID=7174,2) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=155 conf->pending_count=4
[  514.731183] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=155 conf->pending_count=5
[  514.731186] md2: wait_barrier(PID=7174,2) conf->barrier=1 conf->nr_waiting=23 conf->nr_pending=156 conf->pending_count=5
[  514.731193] md2: wait_barrier(PID=7174,1) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=156 conf->pending_count=6
[  514.731371] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=156 conf->pending_count=6
[  514.731374] md2: wait_barrier(PID=7070,2) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=157 conf->pending_count=6
[  514.731382] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=157 conf->pending_count=7
[  514.731385] md2: wait_barrier(PID=7070,2) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=158 conf->pending_count=7
[  514.731388] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=158 conf->pending_count=8
[  514.731390] md2: wait_barrier(PID=7070,2) conf->barrier=1 conf->nr_waiting=24 conf->nr_pending=159 conf->pending_count=8
[  514.731394] md2: wait_barrier(PID=7070,1) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=159 conf->pending_count=9
[  514.731491] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=159 conf->pending_count=9
[  514.731494] md2: wait_barrier(PID=9888,2) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=160 conf->pending_count=9
[  514.731502] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=160 conf->pending_count=10
[  514.731505] md2: wait_barrier(PID=9888,2) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=161 conf->pending_count=10
[  514.731508] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=161 conf->pending_count=11
[  514.731510] md2: wait_barrier(PID=9888,2) conf->barrier=1 conf->nr_waiting=25 conf->nr_pending=162 conf->pending_count=11
[  514.731514] md2: wait_barrier(PID=9888,1) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=162 conf->pending_count=12
[  514.731755] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=162 conf->pending_count=12
[  514.731758] md2: wait_barrier(PID=9895,2) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=163 conf->pending_count=12
[  514.731767] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=163 conf->pending_count=13
[  514.731769] md2: wait_barrier(PID=9895,2) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=164 conf->pending_count=13
[  514.731773] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=164 conf->pending_count=14
[  514.731775] md2: wait_barrier(PID=9895,2) conf->barrier=1 conf->nr_waiting=26 conf->nr_pending=165 conf->pending_count=14
[  514.731826] md2: wait_barrier(PID=9895,1) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=165 conf->pending_count=15
[  514.731875] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=165 conf->pending_count=15
[  514.731878] md2: wait_barrier(PID=9897,2) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=166 conf->pending_count=15
[  514.731884] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=166 conf->pending_count=16
[  514.731887] md2: wait_barrier(PID=9897,2) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=167 conf->pending_count=16
[  514.731890] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=167 conf->pending_count=17
[  514.731893] md2: wait_barrier(PID=9897,2) conf->barrier=1 conf->nr_waiting=27 conf->nr_pending=168 conf->pending_count=17
[  514.731902] md2: wait_barrier(PID=9897,1) conf->barrier=1 conf->nr_waiting=28 conf->nr_pending=168 conf->pending_count=18
[  514.731971] md2: wait_barrier(PID=9896,1) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=168 conf->pending_count=18
[  514.732304] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=168 conf->pending_count=18
[  514.732307] md2: wait_barrier(PID=9908,2) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=169 conf->pending_count=18
[  514.732340] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=169 conf->pending_count=19
[  514.732343] md2: wait_barrier(PID=9908,2) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=170 conf->pending_count=19
[  514.732346] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=170 conf->pending_count=20
[  514.732349] md2: wait_barrier(PID=9908,2) conf->barrier=1 conf->nr_waiting=29 conf->nr_pending=171 conf->pending_count=20
[  514.732352] md2: wait_barrier(PID=9908,1) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=171 conf->pending_count=21
[  514.732397] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=171 conf->pending_count=21
[  514.732400] md2: wait_barrier(PID=5774,2) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=172 conf->pending_count=21
[  514.732412] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=172 conf->pending_count=22
[  514.732414] md2: wait_barrier(PID=5774,2) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=173 conf->pending_count=22
[  514.732419] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=173 conf->pending_count=23
[  514.732421] md2: wait_barrier(PID=5774,2) conf->barrier=1 conf->nr_waiting=30 conf->nr_pending=174 conf->pending_count=23
[  514.732424] md2: wait_barrier(PID=5774,1) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=174 conf->pending_count=24
[  514.732459] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=174 conf->pending_count=24
[  514.732461] md2: wait_barrier(PID=7068,2) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=175 conf->pending_count=24
[  514.732467] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=175 conf->pending_count=25
[  514.732469] md2: wait_barrier(PID=7068,2) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=176 conf->pending_count=25
[  514.732473] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=176 conf->pending_count=26
[  514.732475] md2: wait_barrier(PID=7068,2) conf->barrier=1 conf->nr_waiting=31 conf->nr_pending=177 conf->pending_count=26
[  514.732478] md2: wait_barrier(PID=7068,1) conf->barrier=1 conf->nr_waiting=32 conf->nr_pending=177 conf->pending_count=27
[  514.734807] md2: wait_barrier(PID=9901,1) conf->barrier=1 conf->nr_waiting=33 conf->nr_pending=177 conf->pending_count=27
[  514.734830] md2: wait_barrier(PID=9923,1) conf->barrier=1 conf->nr_waiting=34 conf->nr_pending=177 conf->pending_count=27
[  514.735975] md2: wait_barrier(PID=7175,1) conf->barrier=1 conf->nr_waiting=35 conf->nr_pending=158 conf->pending_count=27
[  514.735992] md2: wait_barrier(PID=7081,1) conf->barrier=1 conf->nr_waiting=36 conf->nr_pending=158 conf->pending_count=27
[  514.736083] md2: wait_barrier(PID=9921,1) conf->barrier=1 conf->nr_waiting=37 conf->nr_pending=158 conf->pending_count=27
[  514.736670] md2: wait_barrier(PID=7075,1) conf->barrier=1 conf->nr_waiting=38 conf->nr_pending=158 conf->pending_count=27
[  514.740425] md2: wait_barrier(PID=9925,1) conf->barrier=1 conf->nr_waiting=39 conf->nr_pending=125 conf->pending_count=27
[  514.740448] md2: wait_barrier(PID=41,1) conf->barrier=1 conf->nr_waiting=40 conf->nr_pending=125 conf->pending_count=27
[  514.740465] md2: wait_barrier(PID=7083,1) conf->barrier=1 conf->nr_waiting=41 conf->nr_pending=125 conf->pending_count=27
[  514.740478] md2: wait_barrier(PID=9911,1) conf->barrier=1 conf->nr_waiting=42 conf->nr_pending=125 conf->pending_count=27
[  514.740492] md2: wait_barrier(PID=9903,1) conf->barrier=1 conf->nr_waiting=43 conf->nr_pending=125 conf->pending_count=27
[  514.741067] md2: wait_barrier(PID=7071,1) conf->barrier=1 conf->nr_waiting=44 conf->nr_pending=125 conf->pending_count=27
[  514.743463] md2: wait_barrier(PID=9920,1) conf->barrier=1 conf->nr_waiting=45 conf->nr_pending=105 conf->pending_count=27
[  514.743478] md2: wait_barrier(PID=9924,1) conf->barrier=1 conf->nr_waiting=46 conf->nr_pending=105 conf->pending_count=27
[  514.743489] md2: wait_barrier(PID=7146,1) conf->barrier=1 conf->nr_waiting=47 conf->nr_pending=105 conf->pending_count=27
[  514.745765] md2: wait_barrier(PID=9890,1) conf->barrier=1 conf->nr_waiting=48 conf->nr_pending=89 conf->pending_count=27
[  514.745784] md2: wait_barrier(PID=9879,1) conf->barrier=1 conf->nr_waiting=49 conf->nr_pending=89 conf->pending_count=27
[  514.746556] md2: wait_barrier(PID=6,1) conf->barrier=1 conf->nr_waiting=50 conf->nr_pending=81 conf->pending_count=27
[  514.748897] md2: wait_barrier(PID=9966,1) conf->barrier=1 conf->nr_waiting=51 conf->nr_pending=75 conf->pending_count=27
[  514.749195] md2: wait_barrier(PID=9900,1) conf->barrier=1 conf->nr_waiting=52 conf->nr_pending=67 conf->pending_count=27
[  514.750047] md2: wait_barrier(PID=9887,1) conf->barrier=1 conf->nr_waiting=53 conf->nr_pending=57 conf->pending_count=27
[  514.750065] md2: wait_barrier(PID=9889,1) conf->barrier=1 conf->nr_waiting=54 conf->nr_pending=57 conf->pending_count=27
[  514.750076] md2: wait_barrier(PID=9893,1) conf->barrier=1 conf->nr_waiting=55 conf->nr_pending=57 conf->pending_count=27
[  514.752565] md2: wait_barrier(PID=9915,1) conf->barrier=1 conf->nr_waiting=56 conf->nr_pending=48 conf->pending_count=27
[  514.752580] md2: wait_barrier(PID=7074,1) conf->barrier=1 conf->nr_waiting=57 conf->nr_pending=48 conf->pending_count=27
[  514.752592] md2: wait_barrier(PID=9902,1) conf->barrier=1 conf->nr_waiting=58 conf->nr_pending=48 conf->pending_count=27
[  514.752603] md2: wait_barrier(PID=9916,1) conf->barrier=1 conf->nr_waiting=59 conf->nr_pending=48 conf->pending_count=27
[  514.752926] md2: wait_barrier(PID=42,1) conf->barrier=1 conf->nr_waiting=60 conf->nr_pending=27 conf->pending_count=27
[  514.752946] md2: wait_barrier(PID=7073,1) conf->barrier=1 conf->nr_waiting=61 conf->nr_pending=27 conf->pending_count=27
[  514.752960] md2: wait_barrier(PID=9914,1) conf->barrier=1 conf->nr_waiting=62 conf->nr_pending=27 conf->pending_count=27

static void raise_barrier(struct r1conf *conf)
{
	spin_lock_irq(&conf->resync_lock);

	pr_info("%s: raise_barrier(1): conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count);

	/* Wait until no block IO is waiting */
	wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
			    conf->resync_lock);

	/* block any new IO from starting */
	conf->barrier++;

	pr_info("%s: raise_barrier(2): conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count);

	/* Now wait for all pending IO to complete */
	wait_event_lock_irq(conf->wait_barrier,
			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
			    conf->resync_lock);

	pr_info("%s: raise_barrier(3): conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count);

	spin_unlock_irq(&conf->resync_lock);
}

static void wait_barrier(struct r1conf *conf)
{
	int waited = 0;
	
	spin_lock_irq(&conf->resync_lock);
	if (conf->barrier) {
		conf->nr_waiting++;

		waited = 1;
		pr_info("%s: wait_barrier(PID=%d,1) conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), current->pid, 
			    conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count);
		if (conf->nr_pending && current->bio_list && !bio_list_empty(current->bio_list))
			dump_stack();

		/* Wait for the barrier to drop.
		 * However if there are already pending
		 * requests (preventing the barrier from
		 * rising completely), and the
		 * pre-process bio queue isn't empty,
		 * then don't wait, as we need to empty
		 * that queue to get the nr_pending
		 * count down.
		 */
		wait_event_lock_irq(conf->wait_barrier,
				    !conf->barrier ||
				    (conf->nr_pending &&
				     current->bio_list &&
				     !bio_list_empty(current->bio_list)),
				    conf->resync_lock);
		conf->nr_waiting--;
	}
	conf->nr_pending++;

	if (waited)
		pr_info("%s: wait_barrier(PID=%d,2) conf->barrier=%d conf->nr_waiting=%d conf->nr_pending=%d conf->pending_count=%d\n", mdname(conf->mddev), current->pid, 
			    conf->barrier, conf->nr_waiting, conf->nr_pending, conf->pending_count);

	spin_unlock_irq(&conf->resync_lock);
}

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux