Data-check brings system to a standstill

Jordan Russell <jr-list-2010@xxxxxx> · Wed, 09 Jun 2010 12:30:53 -0500

Hi,

I have two RAID1 arrays:
- md0, the / partition (ext3), lightly accessed
- md1, a secondary partition (ext3), mounted but not accessed at all

Running on:
Linux version 2.6.32.12-115.fc12.i686.PAE
(mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.4.3 20100127
(Red Hat 4.4.3-4) (GCC) ) #1 SMP Fri Apr 30 20:14:08 UTC 2010

/proc/mdstat output is as follows (ignore md9 and md20, they aren't
being used):

> Personalities : [raid1]
> md20 : active raid1 sda5[0] sdb5[1]
>       12241408 blocks [2/2] [UU]
> 
> md1 : active raid1 sdb6[1] sda6[0]
>       290503296 blocks [2/2] [UU]
>       [=====>...............]  check = 28.7% (83548928/290503296) finish=62.0min speed=55605K/sec
> 
> md9 : active raid1 sda1[0] sdb1[1]
>       1028032 blocks [2/2] [UU]
> 
> md0 : active raid1 sdb2[1] sda2[0]
>       8795520 blocks [2/2] [UU]
> 
> unused devices: <none>

My issue is that when I initiate a data-check on md1, attempts to access
md0 "hang" for long periods of time (sometimes minutes), making the
machine practically unusable. Even simple operations that are in no way
I/O intensive, like receiving a single e-mail message, can hang for minutes.

As of kernel 2.6.32 (I didn't see this when I was running 2.6.24-2.6.27
previously), the kernel also emits "hung task" warnings every few
minutes while the data-check is in progress:

# echo check > /sys/block/md1/md/sync_action
# cat /var/log/messages
> Jun  9 11:11:32 system kernel: md: data-check of RAID array md1
> Jun  9 11:11:32 system kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jun  9 11:11:32 system kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jun  9 11:11:32 system kernel: md: using 128k window, over a total of 290503296 blocks.
> Jun  9 11:18:32 system kernel: INFO: task kjournald:448 blocked for more than 120 seconds.
> Jun  9 11:18:32 system kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun  9 11:18:32 system kernel: kjournald     D 00066faa     0   448      2 0x00000000
> Jun  9 11:18:32 system kernel: f614de5c 00000046 463cb427 00066faa 000039d3 00000000 f6bc0f6c 00000000
> Jun  9 11:18:32 system kernel: c0a81354 c0a85e60 f6bc0f6c c0a85e60 c0a85e60 f614de4c c0462bdf 3a0a8f3d
> Jun  9 11:18:32 system kernel: 00000000 00000000 00066faa f6bc0cc0 c1e08e60 00000000 f614deac f614de54
> Jun  9 11:18:32 system kernel: Call Trace:
> Jun  9 11:18:32 system kernel: [<c0462bdf>] ? ktime_get_ts+0x98/0xa2
> Jun  9 11:18:32 system kernel: [<c07a503f>] io_schedule+0x37/0x4e
> Jun  9 11:18:32 system kernel: [<c0501f94>] sync_buffer+0x38/0x3c
> Jun  9 11:18:32 system kernel: [<c07a54fe>] __wait_on_bit+0x39/0x60
> Jun  9 11:18:32 system kernel: [<c0501f5c>] ? sync_buffer+0x0/0x3c
> Jun  9 11:18:32 system kernel: [<c0501f5c>] ? sync_buffer+0x0/0x3c
> Jun  9 11:18:32 system kernel: [<c07a55c5>] out_of_line_wait_on_bit+0xa0/0xa8
> Jun  9 11:18:32 system kernel: [<c045b6cd>] ? wake_bit_function+0x0/0x3c
> Jun  9 11:18:32 system kernel: [<c0501ed1>] __wait_on_buffer+0x1e/0x21
> Jun  9 11:18:32 system kernel: [<c056aa86>] wait_on_buffer+0x34/0x37
> Jun  9 11:18:32 system kernel: [<c056b2dc>] journal_commit_transaction+0x7b3/0xc57
> Jun  9 11:18:32 system kernel: [<c044e742>] ? try_to_del_timer_sync+0x5e/0x66
> Jun  9 11:18:32 system kernel: [<c056de45>] kjournald+0xb8/0x1cc
> Jun  9 11:18:32 system kernel: [<c045b699>] ? autoremove_wake_function+0x0/0x34
> Jun  9 11:18:32 system kernel: [<c056dd8d>] ? kjournald+0x0/0x1cc
> Jun  9 11:18:32 system kernel: [<c045b461>] kthread+0x64/0x69
> Jun  9 11:18:32 system kernel: [<c045b3fd>] ? kthread+0x0/0x69
> Jun  9 11:18:32 system kernel: [<c0409cc7>] kernel_thread_helper+0x7/0x10
...

I don't understand why this is happening, given that md claims to be
using "idle IO bandwidth". Does "idle IO bandwidth" only refer to I/O on
the same md device (i.e. md1)? Does it not allow other md devices (i.e.
md0) to cut into the bandwidth used for the data-check?

I can work around this by setting sync_speed_max to a value about 5 MB
lower than the current sync rate. But since the transfer rate drops as
it nears the end of the disks, I have to keep lowering the
sync_speed_max value, otherwise the hangs resurface.

If I force a resync of md1 by removing and re-adding sdb6 (with mdadm
-f, -r, -a), I *don't* see this problem. There are no "hung task"
warnings during the resync, and AFAICT no significant hangs on md0.

Any ideas on what might be causing this, or how to solve it?
Thanks!

-- 
Jordan Russell
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html