Mark Lord wrote: >> sdb: Current: sense key: Medium Error >> Additional sense: Unrecovered read error - auto reallocate failed >> end_request: I/O error, dev sdb, sector 398283329 >> raid1: Disk failure on sdb2, disabling device. >> Operation continuing on 1 devices > > > Oh good, *now* we've gotten somewhere!! > > Albert / Jens / Jeff: > > The command failing above is SCSI WRITE_10, which is being > translated into ATA_CMD_WRITE_FUA_EXT by libata. > > This command fails -- unrecognized by the drive in question. > But libata reports it (most incorrectly) as a "medium error", > and the drive is taken out of service from its RAID. > > Bad, bad, and worse. > > Libata should really recover from this, by recognizing that > the command was rejected, and replacing it with a simple > WRITE_EXT instead. Possibly followed by FLUSH_CACHE. > > So.. I've forgotten who put FUA into libata, but hopefully > it's one of the folks on the CC: list, and that nice person > can now generate a patch to fix this bug somehow. Thanks Mark I'm glad it's a bug and not bad hardware. I am quite concerned that the basic effect of just booting a practically vanilla 2.6.16-rc4 like this was to fry my raid array. Luckily it dropped 2 (of 3) disks so quickly that the event counter was the same allowing an easy rebuild. 2.6.15 has similar issues but they seem to happen *very* infrequently by comparison - this hit me several times during a single boot. Should Linus (cc'ed) hold off on 2.6.16 because of this or not? David - : send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html