Re: [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Garzik wrote:
Albert Lee wrote:
Use AC_ERR_TIMEOUT err_mask for time out,
instead of generating err_mask from drv_stat.

Signed-off-by: Albert Lee <albertcc@xxxxxxxxxx>
---
This patch was accepted in irq-pio before.

Patch against upstream (8b316a3973f05e572b4edeeda9072987f6bbaa44).
For your review, thanks.

Albert

--- upstream0/drivers/scsi/libata-core.c 2006-03-31 13:33:15.000000000 +0800 +++ upstream1/drivers/scsi/libata-core.c 2006-03-31 13:34:59.000000000 +0800
@@ -3827,7 +3827,7 @@ static void ata_qc_timeout(struct ata_qu
                ap->id, qc->tf.command, drv_stat, host_stat);
/* complete taskfile transaction */
-        qc->err_mask |= ac_err_mask(drv_stat);
+        qc->err_mask |= AC_ERR_TIMEOUT;

The intention here is to catch missed interrupts; due to flaky hardware or buggy drivers, we may miss an interrupt, but still be able to complete the qc in the timeout handler. Thus, if the ac_err_mask() indicates a BSY or DRQ or DF is asserted, it will indicate an error, otherwise not.

So, NAK, and we should fix irq-pio for this as well...


Hello, Jeff. Hello, Albert.

I think we must not successfully finish a qc after timeout. If a device hasn't replied in 30 seconds, it's best to assume the device is in an unknown state. ATA TF registers just don't have enough information to tell whether it looks like that because it finished the command or it has simply forgotten about everything.

For example, if a communication error occurs while a qc is being issued and the controller fails to report, the device would have been in the ready state until the qc times out. It does know nothing about the command but ->eng_timeout will complete the command successfully potentially causing data corruption.

Or, if a power fluctation or static discharge breaks the link while a qc is in progress (this happens very easily with SATA, gigahertz serial signals are easy to disrupt), it can result in PHY reset causing the device to forget about the command. Similar result can be obtained with the right combination of device and controller by simply unplugging and replugging the cable while commands are in progress. (I've done a lot of it.)

It just isn't worth to successfully complete a command after 30 seconds at the risk of data corruption. If the device/controller times out on most of commands, the combination is unuseable no matter what we do on timeout (one command every 30secs..). If the combination fails occasionally, retrying doesn't take little time and is much safer.

Thanks.

--
tejun
-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux