Re: LibPATA code issues / 2.6.15.4

David Greaves <david@xxxxxxxxxxxx> · Sun, 26 Feb 2006 09:56:27 +0000

Mark Lord wrote:

>> sdb: Current: sense key: Medium Error
>>     Additional sense: Unrecovered read error - auto reallocate failed
>> end_request: I/O error, dev sdb, sector 398283329
>> raid1: Disk failure on sdb2, disabling device.
>>         Operation continuing on 1 devices
>
>
> Oh good, *now* we've gotten somewhere!!
>
> Albert / Jens / Jeff:
>
> The command failing above is SCSI WRITE_10, which is being
> translated into ATA_CMD_WRITE_FUA_EXT by libata.
>
> This command fails -- unrecognized by the drive in question.
> But libata reports it (most incorrectly) as a "medium error",
> and the drive is taken out of service from its RAID.
>
> Bad, bad, and worse.
>
> Libata should really recover from this, by recognizing that
> the command was rejected, and replacing it with a simple
> WRITE_EXT instead.  Possibly followed by FLUSH_CACHE.
>
> So.. I've forgotten who put FUA into libata, but hopefully
> it's one of the folks on the CC: list, and that nice person
> can now generate a patch to fix this bug somehow.

Thanks Mark

I'm glad it's a bug and not bad hardware.

I am quite concerned that the basic effect of just booting a practically
vanilla 2.6.16-rc4 like this was to fry my raid array.

Luckily it dropped 2 (of  3) disks so quickly that the event counter was
the same allowing an easy rebuild.

2.6.15 has similar issues but they seem to happen *very* infrequently by
comparison - this hit me several times during a single boot.

Should Linus (cc'ed) hold off on 2.6.16 because of this or not?

David

-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html