On Mon, 12 Feb 2007 09:02:33 +1100, Neil Brown wrote > On Sunday February 11, marcm@xxxxxxxxxxxxxxxx wrote: > > Greetings, > > > > I've been running md on my server for some time now and a few days ago one of > > the (3) drives in the raid5 array starting giving read errors. The result was > > usually system hangs and this was with kernel 2.6.17.13. I upgraded to the > > latest production 2.6.20 kernel and experienced the same behaviour. > > System hangs suggest a problem with the drive controller. However > this "kernel BUG" is something newly introduced in 2.6.20 which > should be fixed in 2.6.20.1. Patch is below. > > If you still get hangs with this patch installed, then please report > detail, and probably copy to linux-ide@xxxxxxxxxxxxxxxx > > NeilBrown > > Fix various bugs with aligned reads in RAID5. > > It is possible for raid5 to be sent a bio that is too big > for an underlying device. So if it is a READ that we > pass stright down to a device, it will fail and confuse > RAID5. > > So in 'chunk_aligned_read' we check that the bio fits within the > parameters for the target device and if it doesn't fit, fall back > on reading through the stripe cache and making lots of one-page > requests. > > Note that this is the earliest time we can check against the device > because earlier we don't have a lock on the device, so it could > change underneath us. > > Also, the code for handling a retry through the cache when a read > fails has not been tested and was badly broken. This patch fixes > that code. > > Signed-off-by: Neil Brown <neilb@xxxxxxx> > Thanks for the quick response Neil unfortunately the kernel doesn't build with this patch due to a missing symbol: WARNING: "blk_recount_segments" [drivers/md/raid456.ko] undefined! Is that in another file that needs patching or within raid5.c? Marc -- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html