Re: raid10 failing to fail...

Arthur Jones <ajones@xxxxxxxxxxxx> · Tue, 22 Jul 2008 14:07:13 -0700

Hi all, ...

On Mon, Jul 21, 2008 at 12:00:18PM -0700, Arthur Jones wrote:
> Hi All,  I have a raid10 array md0 which, when I
> fail all reads on the first underlying block device
> (/dev/sda5), reads from the array hang forever.
> 
> This is easy for me to reproduce (blkflsbuf is
> a simple C program that does a BLKFLSBUF ioctl):
> 
> # mount -t debugfs debugfs /debug
> # echo 1 > /debug/fail_make_request/interval
> # echo 100 > /debug/fail_make_request/probability
> # echo 0 > /debug/fail_make_request/space
> # echo N > /debug/fail_make_request/task-filter
> # echo -1 > /debug/fail_make_request/times
> # echo 2 > /debug/fail_make_request/verbose
> # ./blkflsbuf /dev/md0
> # ./blkflsbuf /dev/sda5
> # echo 1 > /sys/block/sda/sda5/make-it-fail
> # dd if=/dev/sda5 of=/dev/null count=1
> [fails as expected]
> # ./blkflsbuf /dev/sda5
> # dd if=/dev/md0 of=/dev/null count=1
> [hangs here -- kill -INT, -TERM, -KILL
>  have no effect...]
> 
> I expected to get the reads coming from the
> working mirror and to have the failing disk
> marked Faulty.  This is tested w/ today's
> git tree, but seems to happen on every kernel
> I've tried (2.6.9 RHEL4, 2.6.26).
> 
> Any ideas?  Thanks...

It seems to be stuck at wait_event_lock_irq in freeze_array()
in raid10.c.  Maybe this is similar to:

http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/0401.html

Arthur
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html