On Wed, 13 Mar 2013 12:29:24 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx> wrote: > Neil, > > I've noticed that when too many devices fail in a RAID arrary that > addtional I/O will hang, yielding an endless supply of: > Mar 12 11:52:53 bp-01 kernel: Buffer I/O error on device md1, logical block 3 > Mar 12 11:52:53 bp-01 kernel: lost page write due to I/O error on md1 > Mar 12 11:52:53 bp-01 kernel: sector=800 i=3 (null) (null) > (null) (null) 1 This is the third report in as many weeks that mentions that WARN_ON. The first two where quite different causes. I think this one is the same as the first one, which means it would be fixed by md/raid5: schedule_construction should abort if nothing to do. which is commit 29d90fa2adbdd9f in linux-next. > Mar 12 11:52:53 bp-01 kernel: ------------[ cut here ]------------ > Mar 12 11:52:53 bp-01 kernel: WARNING: at drivers/md/raid5.c:354 init_stripe+0x2d4/0x370 [raid456]() > > Are other people seeing this, or is this an artifact of the way I am killing > devices ('echo offline > /sys/block/$dev/device/state')? That is a perfectly good way to kill a device. > > I would prefer to get immediate errors if nothing can be done to satisfy the > request and I've been thinking of something like the attached patch. The > patch below is incomplete. It does not take into account any reshaping that > is going on, nor does it try to figure out if a mirror set in RAID10 has died; > but I hope it gets the basic idea across. > > Is this a good way to handle this situation, or am I missing something? I think we do get immediate errors (once all bugs are fixed). Your patch does extra work for every request which is only of value if the array has failed - and it really doesn't make sense to optimise for a failed array. The current approach is to just try to satisfy a request and once we find that we need to do something that is impossible - return an error at that point. I think that is best. Can you try the commit I identified and see if it makes the problem go away? Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature