On first error the system currently appears to just abandon a drive, forcing the system into degraded mode for all I/O which follows. A much more reasonable approach would be to not abandon the drive completely, but rather build a fast lookup table with known bad blocks which would allow accesses to most areas of the array to continue without degradation, and only areas that have bad blocks would be forced into degraded mode. Many drives will trash a sector if power drops when writing, and that sector will generate read errors until written. It makes sense on those drives to recover the data in degraded mode, and re-write followed by a verify. If the verify fails, and the drive support dynamic sparing/remapping the sector should be remapped, rewritten, and verified again. On a large 200GB arry, this single feature would remove nearly a day of reconstruction time for normal errors and sector failures, substantially improving realized reliability. Doing dynamic error management would remove 99% of the gross software raid device failures I have seen over the last year. John Bass - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html