Re: Testing for hardware bug in EHCI controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 26 Mar 2013, Noone Nowhere wrote:

> Hello again Alan, we read that the patch broke.

What patch?  The one that works around the Intel/AMD hardware problem?  
Yes, it had a mistake, which has now been fixed.  (Although the fix has 
not yet been released in a 3.8.stable kernel.)

>  Damn, our fear was
> justified. If we are right, it will break again because it affects
> various
> devices using different busses, each one with its limitations and
> defects.

That reasoning doesn't make sense.  Practically every line of code in 
ehci-hcd.c and its associated files affects various devices using 
different buses.  If your fear was justified, the driver would never 
work at all.

>  That's why we wrote to Sarah(and got no reply yet) that

Your message to Sarah was little more than a rant about issues that are 
out of her control and took place long before she started working at 
Intel.  Surely you didn't expect her to have a constructive reply?

> vendors
> should fix their h/w bugs in-house or give open source developers full
> documentation (like nda datasheets,bios spec,bios spec up)
> to cope with their >buggy< hardware!

Well, it's questionable whether this really should be called a bug.  
Clearly it is undesirable behavior, but strictly speaking it is not in
violation of the EHCI spec.

> > Probably it is cured.  But something is still wrong, even though it may
> > be unrelated.
> 
> Since something is broken and the last program tested exits at A50M,
> we have some questions for today prior testing A50M, moschip and
> ICH4M:
> 
> 1)Do we need to apply a third
> patch(http://marc.info/?l=linux-usb&m=136379040918329&w=2) [it's upper
> part to be specific]?

Are you considering the hardware workaround and the bug-detection patch
as the first two?  Then yes, a third patch is needed, but not the one
you mentioned.  The third patch is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=d714aaf649460cbfd5e82e75520baa856b4fa0a0

Oddly, I can't find the patch submission anywhere in the email
archives.  Maybe I forgot to CC: linux-usb when sending it in.

> If yes, please make
> a cumulative patch for 3.8 series before we mess things up and post it
> under this subject.

It is below.

> At the same post, add the final test program with
> program version number printing as well as a "do NOT unplug the usb
> and a leave the program to go up to 1000" message. Also point out that
> the
> device must not be inside the unusual devs file as you told us. This
> will make it easier for others to follow and help you.

You can easily make these changes and post the result yourself.

> 2)If A50M still fails, do you have time to analyze our usbmon output?

Yes.

> 3)Has any type of transfer corruption been observed from this bug?

Not as far as I know.

Alan Stern



Index: 3.8/drivers/usb/host/ehci-q.c
===================================================================
--- 3.8.orig/drivers/usb/host/ehci-q.c
+++ 3.8/drivers/usb/host/ehci-q.c
@@ -547,7 +547,7 @@ qh_completions (struct ehci_hcd *ehci, s
 	if (stopped != 0 || hw->hw_qtd_next == EHCI_LIST_END(ehci)) {
 		switch (state) {
 		case QH_STATE_IDLE:
-			qh_refresh(ehci, qh);
+//			qh_refresh(ehci, qh);
 			break;
 		case QH_STATE_LINKED:
 			/* We won't refresh a QH that's linked (after the HC
@@ -1170,7 +1170,7 @@ static void single_unlink_async(struct e
 	struct ehci_qh		*prev;
 
 	/* Add to the end of the list of QHs waiting for the next IAAD */
-	qh->qh_state = QH_STATE_UNLINK;
+	qh->qh_state = QH_STATE_UNLINK_WAIT;
 	if (ehci->async_unlink)
 		ehci->async_unlink_last->unlink_next = qh;
 	else
@@ -1213,9 +1213,19 @@ static void start_iaa_cycle(struct ehci_
 
 		/* Do only the first waiting QH (nVidia bug?) */
 		qh = ehci->async_unlink;
-		ehci->async_iaa = qh;
-		ehci->async_unlink = qh->unlink_next;
-		qh->unlink_next = NULL;
+
+		/*
+		 * Intel (?) bug: The HC can write back the overlay region
+		 * even after the IAA interrupt occurs.  In self-defense,
+		 * always go through two IAA cycles for each QH.
+		 */
+		if (qh->qh_state == QH_STATE_UNLINK_WAIT) {
+			qh->qh_state = QH_STATE_UNLINK;
+		} else {
+			ehci->async_iaa = qh;
+			ehci->async_unlink = qh->unlink_next;
+			qh->unlink_next = NULL;
+		}
 
 		/* Make sure the unlinks are all visible to the hardware */
 		wmb();
@@ -1232,6 +1242,7 @@ static void start_iaa_cycle(struct ehci_
 static void end_unlink_async(struct ehci_hcd *ehci)
 {
 	struct ehci_qh		*qh;
+	__hc32			tok1, tok2;
 
 	if (ehci->has_synopsys_hc_bug)
 		ehci_writel(ehci, (u32) ehci->async->qh_dma,
@@ -1242,6 +1253,7 @@ static void end_unlink_async(struct ehci
 	ehci->async_unlinking = true;
 	while (ehci->async_iaa) {
 		qh = ehci->async_iaa;
+		tok1 = ACCESS_ONCE(qh->hw->hw_token);
 		ehci->async_iaa = qh->unlink_next;
 		qh->unlink_next = NULL;
 
@@ -1250,8 +1262,14 @@ static void end_unlink_async(struct ehci
 
 		qh_completions(ehci, qh);
 		if (!list_empty(&qh->qtd_list) &&
-				ehci->rh_state == EHCI_RH_RUNNING)
+				ehci->rh_state == EHCI_RH_RUNNING) {
+			udelay(10);
+			tok2 = ACCESS_ONCE(qh->hw->hw_token);
+			if (tok1 != tok2)
+				ehci_err(ehci, "EHCI hardware bug detected: %08x %08x\n",
+						tok1, tok2);
 			qh_link_async(ehci, qh);
+		}
 		disable_async(ehci);
 	}
 	ehci->async_unlinking = false;
Index: 3.8/drivers/usb/host/ehci-hcd.c
===================================================================
--- 3.8.orig/drivers/usb/host/ehci-hcd.c
+++ 3.8/drivers/usb/host/ehci-hcd.c
@@ -748,11 +748,9 @@ static irqreturn_t ehci_irq (struct usb_
 		/* guard against (alleged) silicon errata */
 		if (cmd & CMD_IAAD)
 			ehci_dbg(ehci, "IAA with IAAD still set?\n");
-		if (ehci->async_iaa) {
+		if (ehci->async_iaa)
 			COUNT(ehci->stats.iaa);
-			end_unlink_async(ehci);
-		} else
-			ehci_dbg(ehci, "IAA with nothing unlinked?\n");
+		end_unlink_async(ehci);
 	}
 
 	/* remote wakeup [4.3.1] */
Index: 3.8/drivers/usb/host/ehci-timer.c
===================================================================
--- 3.8.orig/drivers/usb/host/ehci-timer.c
+++ 3.8/drivers/usb/host/ehci-timer.c
@@ -305,7 +305,7 @@ static void ehci_iaa_watchdog(struct ehc
 	 * (a) SMP races against real IAA firing and retriggering, and
 	 * (b) clean HC shutdown, when IAA watchdog was pending.
 	 */
-	if (ehci->async_iaa) {
+	if (1) {
 		u32 cmd, status;
 
 		/* If we get here, IAA is *REALLY* late.  It's barely

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux