Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Andi.

(cc'ing linux-scsi and quoting whole message)

On 06/13/2010 05:48 PM, Andi Kleen wrote:
> 
> Hi,
> 
> On 2.6.34:
> 
> While writing some data to an old PATA Maxtor disk connected to a
> PDC20268 Promise controller using the libata driver there were some
> IO errors.
> 
> After some time those resulted in a endless error message loop that
> made the system essentially unusable: (console was flooded and
> unusable, ssh was extremly slow etc.):
> 
> This does not exactly look like graceful error handling.
> 
> Excerpts from the log (full version available on request)
> 
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
> ata12: EH complete
> ata12.00: limiting speed to UDMA/66:PIO4
> ata12: soft resetting link
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> EXT4-fs (dm-0): mounted filesystem with ordered data mode
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (dm-1): using internal journal
> EXT3-fs (dm-1): mounted filesystem with writeback data mode
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> 
> ...
> 
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> 
> 
> ... lots of similar messages until it goes down to PIO0 then some more errors ....
> 
> 
> ata12: soft resetting link
> ata12: soft resetting link
> ata12: link is slow to respond, please be patient (ready=0)
> ata12.00: qc timeout (cmd 0xec)
> ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata12: link is slow to respond, please be patient (ready=0)
> ata12: soft resetting link
> ata12.00: disabled
> ata12: soft resetting link
> ata12: EH complete

At this point, the drive stopped responding and libata removed the
drive from the system.

> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00

As the device is gone, any command is immediately failed with
DID_BAD_TARGET.

> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> sd 11:0:0:0: [sdd] Unhandled error code
> 
> and finally and endless flood of 
> 
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> 
> same messages repeating forever, just with CDB changing occasionally.
> 
> ....
> 
> not stopping until I reset the box.

Did you have a lot of dirty pages?  It looks like upper layer is
trying to flush all the dirty buffers and SCSI is a tad bit too
verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
long time if there are many to fail.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux