On Sun, Apr 16, 2006 at 08:46:52PM -0300, Carlos Carvalho wrote: > Neil Brown (neilb@xxxxxxx) wrote on 17 April 2006 09:30: > >The easiest thing to do when you get an error on a drive is to kick > >the drive from the array, so that is what the code always did, and > >still does in many cases. > >It is arguable that for a read error on a degraded raid5, that may not > >be the best thing to do, but I'm not completely convinced. > > I don't see how it could be different. If the array is degraded and > one more disk fails there's no way to obtain the information, so the > md device just fails like a single disk. Not necessarily. You probably have something like (say) 200GB of data stripes across that disk. That one read error may affect just one or a few which means there's a whole buttload of data that could be retrieved still. Perhaps setting the entire raid array read-only on such an error would be better? That makes it a choice between potentially losing everything and having writes and some reads fail as you have a mild stroke trying to get another drive in on things. Put the drive in, let the array do the best it can to restore things, fail the bad drive, put another disk in, have it come up fully and the fsck it good. At least this way you probably have less of a chance of losing the entire array of data and who knows, only the 'less important' files might be lost. :) Anyway, my 2c. :) -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html