Re: 2.6.39: raid1 check blocks jbd on other md more than 120 seconds

Frank van Maarseveen <frankvm@xxxxxxxxxxx> · Fri, 3 Jun 2011 14:36:46 +0200

On Fri, Jun 03, 2011 at 08:08:01AM -0400, Thomas Harold wrote:
> On 6/2/2011 5:36 AM, Frank van Maarseveen wrote:
> >The system runs FC14 with an (almost) stock 2.6.39 kernel, configured to
> >panic if it seems to hang. That's exactly what started to happen without
> >anything being logged in the normal way except over netconsole.
> >
> >/proc/mdstat:
> >Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
> >md3 : active raid1 sda3[0] sdb3[1]
> >       1885338488 blocks super 1.2 [2/2] [UU]
> >
> >md1 : active raid1 sda1[0] sdb1[1]
> >       33555384 blocks super 1.2 [2/2] [UU]
> >
> >kernel messages:
> >	(/etc/cron.weekly/99-raid-check kicks in)
> >Jun  2 04:04:00 janus md: data-check of RAID array md3
> >Jun  2 04:04:00 janus md: delaying data-check of md1 until md3 has finished (they share one or more physical units)
> >Jun  2 04:04:00 janus md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> >Jun  2 04:04:00 janus md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> >Jun  2 04:04:00 janus md: using 128k window, over a total of 1885338488 blocks.
> >Jun  2 04:55:54 janus INFO: task jbd2/md1-8:1188 blocked for more than 120 seconds.
> >Jun  2 04:55:54 "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >Jun  2 04:55:54 janus jbd2/md1-8     D
> 
> That's a bug that you'll see in CentOS/RHEL in cases where there are
> multiple arrays to be checked, that use the same set of disks.  I
> first saw it in CentOS 5.5 (or maybe 5.6).
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=573106
> 
> It's an annoying message, but the weekly raid sync runs fine.

According to the bugzilla report it was the resync itself which got stuck,
unlike what I am seeing where any random program may get stuck. Depending
on kernel configuration it may trigger a kernel panic. Last time:

Jun  2 18:48:44 janus kernel: INFO: task master:2705 blocked for more than 120 seconds.
Jun  2 18:48:44 janus kernel: INFO: task pickup:19276 blocked for more than 120 seconds.
Jun  2 18:50:44 janus kernel: INFO: task jbd2/md1-8:1187 blocked for more than 120 seconds.
Jun  2 18:50:45 janus kernel: INFO: task python:1890 blocked for more than 120 seconds.
Jun  2 19:28:45 janus kernel: INFO: task master:2705 blocked for more than 120 seconds.
Jun  2 19:28:45 janus kernel: INFO: task pickup:20589 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task jbd2/md1-8:1187 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task master:2705 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task qmgr:2718 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task pickup:20589 blocked for more than 120 seconds.

-- 
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html