Re: soft-lockup in raid5 / 2.6.24

dean gaudet <dean@xxxxxxxxxx> · Sun, 13 Apr 2008 21:23:28 -0700 (PDT)

On Mon, 7 Apr 2008, Dan Williams wrote:

> On Mon, Apr 7, 2008 at 8:22 AM, dean gaudet <dean@xxxxxxxxxx> wrote:
> > while my system was doing its monthly check (debian) i bumped into
> >  soft-lockups in the raid5 code.  i must mention this is a rather mature
> >  system, the disks are 3.5 years old at this point... so i'm not surprised
> >  that the check will find some bad sectors which will take the
> >  device/driver a while to read/correct.
> >
> >  all SMART events in this log are on disks which are part of /dev/md4 which
> >  is a raid5.  the controller is a 3ware 7508, the disks are seagate
> >  ST3400832A.
> >
> >  apparently no errors propagated all the way up to raid5, but based on the
> >  SMART events i'm pretty sure several sectors took a long time to finish
> >  being read and were probably corrected by the drive itself.
> >
> >  this is debian kernel image 2.6.24-1-686 version 2.6.24-4 which contains
> >  upstream 2.6.24.2.
> >
> >  -dean
> >
> >  root@neemlark:~# cat /sys/block/md4/md/mismatch_cnt
> >  0
> >
> 
> This does not look like the hang that is set to be fixed in 2.6.24.5.
> Does /proc/mdstat show that the resync is progressing, or does it
> appear to be stalled?

oh it finished, sorry i should have said that :)

there were just those various soft lockup warnings...

as a wild stab in the dark -- does the "faulty" test mode support
delaying reads or writes for a very long time?  (or is there some other
fake block device we can inject long delays with?)

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html