Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 30, 2007 at 03:37:36PM -0800, Gary Hade wrote:
> On Tue, Jan 30, 2007 at 04:32:34PM +0900, Tejun Heo wrote:
> > Hello, Gary.
> > 
> > Gary Hade wrote:
> > >>> If they verify your fix (ie,
> > >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs
> > >>> after SRST), I'll push similar patch upstream.
> > >> Thanks.  If you think that changes to increase the delays are
> > >> the way to go (at least until we can find a better solution)
> > >> I can provide patches.
> > > 
> > > Tejun, 
> > > I haven't heard anything from you on this so I'm including a delay
> > > increase patch against 2.6.20-rc6 for the 'ata-piix' case below.  
> > > I hope that you, Jeff, and others find this acceptable.
> > 
> > Sorry about being unresponsive.  The thing is that the change adds
> > unnecessary 2 secs of delay to a lot of other normal device-not-present
> > cases, so I was hesitant to ack the patch.  I'll give it more thoughts
> > (and respond timely this time :-)
> 
> Thanks!  My followup was untimely so we're even. :-)
> 
> Some of my random thoughts:
> There does appear to be this invalid assumption that 0xFF status 
> always implies device-not-present.  The status register access 
> restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The 
> contents of this register, except for BSY, shall be ignored when 
> BSY is set to one." which the code does not honor.  There is apparently 
> past experience that 0xFF status implies device-not-present for some
> controllers (the odd clowns :) but I have no idea how common these are.
> We obviously can't get rid of the check but since we cannot clear
> the read-only status register and there appears to be no specification 
> dictated upper limit on how long it should take for a software reset to 
> complete it just seems like we need to wait long enough to support the 
> slowest known device which may be the GoVault.
> 
> > 
> > > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is 
> > > useable following boot although the below messages are being logged 
> > > during initialization.  Please let me know if you have any thoughts 
> > > on this.  
> > >   scsi1 : ahci
> > >   ata2: softreset failed (port busy but CLO unavailable)
> > >   ata2: softreset failed, retrying in 5 secs
> > >   ata2: port is slow to respond, please be patient (Status 0x80)
> > >   ata2: port failed to respond (30 secs, Status 0x80)
> > >   ata2: COMRESET failed (device not ready)
> > >   ata2: hardreset failed, retrying in 5 secs
> > >   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >   ata2.00: ATAPI, max UDMA/66
> > >   ata2.00: configured for UDMA/66
> > 
> > The above should have been fixed in 2.6.20-rc6.  Please test it.  It was
> > caused by the ahci driver incorrectly clearing ahci CAP register and
> > fixed recently.
> 
> I'm clearly seeing this with 2.6.20-rc6 but unlike the ata-piix
> issue it does not appear to be dependent on the port to which the
> device is attached.  I've been playing around with this today and
> found that it could be solved by inserting a delay between the 
> ahci_stop_engine() call and BSY/DRQ check.
> 
> This change:
> --- linux-2.6.20-rc6/drivers/ata/ahci.c.orig	2007-01-30 11:01:20.000000000 -0800
> +++ linux-2.6.20-rc6/drivers/ata/ahci.c	2007-01-30 12:59:38.000000000 -0800
> @@ -804,6 +804,19 @@ static int ahci_softreset(struct ata_por
>  		goto fail_restart;
>  	}
> 
> +	{
> +		int delay;
> +		u8 stat;
> +		for (delay = 0; delay < 2000; delay+=100) {
> +			if (!(ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)))
> +				break;
> +			msleep(100);
> +			stat = ahci_check_status(ap);
> +			ata_port_printk(ap, KERN_INFO, "delay=%d BSY=%d DRQ=%d\n",
> +				delay, (stat & ATA_BUSY)?1:0, (stat & ATA_DRQ)?1:0);
> +		}
> +	}
> +
>  	/* check BUSY/DRQ, perform Command List Override if necessary */
>  	if (ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)) {
>  		rc = ahci_clo(ap);
> 
> Yielded this output both with and without the RDC inserted:
> scsi1 : ahci
> ata2: delay=0 BSY=1 DRQ=0
> ata2: delay=100 BSY=1 DRQ=0
> ata2: delay=200 BSY=1 DRQ=0
> ata2: delay=300 BSY=1 DRQ=0
> ata2: delay=400 BSY=1 DRQ=0
> ata2: delay=500 BSY=1 DRQ=0
> ata2: delay=600 BSY=1 DRQ=0
> ata2: delay=700 BSY=1 DRQ=0
> ata2: delay=800 BSY=1 DRQ=0
> ata2: delay=900 BSY=1 DRQ=0
> ata2: delay=1000 BSY=1 DRQ=0
> ata2: delay=1100 BSY=1 DRQ=0
> ata2: delay=1200 BSY=0 DRQ=0
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: ATAPI, max UDMA/66
> ata2.00: configured for UDMA/66
> 
> So it appears that we may also have a similar device slowness issue 
> with this driver.

Tejun,
I instrumented the code and found that for the SATA hard drive BSY was set 
just before the call to ahci_init_port() from ahci_port_start() and clear 
after the return from ahci_init_port().  For the GoVault BSY was still set 
after the return from ahci_init_port() and remained set for almost 2 seconds.

The below patch which gives BSY some extra time to clear repairs the problem.  
Unlike the extra delay for ata-piix needed by GoVault I believe this delay 
will only be seen for attached devices that need it.  Please let me know 
what you think.  

Thanks.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@xxxxxxxxxx
http://www.ibm.com/linux/ltc


We encountered a problem where the BSY status bit is still 
set on entry to the 'ahci' error handler during initialization
of the Quantum GoVault when attached to an ICH6R/ICH6RW controller.
This caused a software reset failure due to failed BSY/DRQ check
forcing a hard reset with the following messages logged.
  ata1: softreset failed (port busy but CLO unavailable)
  ata1: softreset failed, retrying in 5 secs
  ata1: port is slow to respond, please be patient (Status 0x80)
  ata1: port failed to respond (30 secs, Status 0x80)
  ata1: COMRESET failed (device not ready)
  ata1: hardreset failed, retrying in 5 secs

It was taking almost 2 seconds for BSY to clear following the
return from ahci_init_port() in ahci_port_start() so this patch
gives BSY up to 3 seconds extra time to clear eliminating the
problem.

Signed-off-by: Gary Hade <garyhade@xxxxxxxxxx>

--- linux-2.6.20-rc7/drivers/ata/ahci.c.orig	2007-02-16 10:11:21.000000000 -0800
+++ linux-2.6.20-rc7/drivers/ata/ahci.c	2007-02-16 13:23:04.000000000 -0800
@@ -1423,6 +1423,8 @@ static int ahci_port_start(struct ata_po
 	void *mem;
 	dma_addr_t mem_dma;
 	int rc;
+	u8 status;
+	unsigned long timeout;
 
 	pp = kmalloc(sizeof(*pp), GFP_KERNEL);
 	if (!pp)
@@ -1477,6 +1479,17 @@ static int ahci_port_start(struct ata_po
 	/* initialize port */
 	ahci_init_port(port_mmio, hpriv->cap, pp->cmd_slot_dma, pp->rx_fis_dma);
 
+	status = ahci_check_status(ap);
+
+	/* for some devices we need to delay to allow BSY to clear */
+	if (status & ATA_BUSY) {
+		timeout = jiffies + 3*HZ;
+		while ((status & ATA_BUSY) && time_before(jiffies, timeout)) {
+			msleep(50);
+			status = ahci_check_status(ap);
+		}
+	}
+
 	return 0;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux