Re: Testing for hardware bug in EHCI controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2013/2/26 Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>:
> Sarah (and anyone else who's interested):
>
> A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
> controllers.  You pointed out that these are rather old components, not
> being used in current systems, which is quite true.
>
> Now I have figured out a simple way for anyone to test for this bug in
> any EHCI controller, without the need for a g-zero gadget.  It's a
> two-part procedure:
>
>         Apply the patch below (which is written for vanilla 3.8) and
>         load the resulting driver.  The patch adds an explicit test
>         to ehci-hcd for detecting the bug.
>
>         Then plug in an ordinary USB flash drive and run the attached
>         program (as root), giving it the device path for the flash
>         drive as the single command-line argument.  For example:
>
>                 sudo ./ehci-test /dev/bus/usb/002/003
>
> The program won't do anything bad to the flash drive; it just reads the
> first 256 KB of data over and over again, now and then unlinking an URB
> to try and trigger the bug.  If the program works right, it will print
> out a loop counter every hundred iterations.  If it runs for 1000
> iterations with no error messages in the kernel log, you may consider
> that the controller has passed the test.  This should take under a
> minute, depending on the hardware speed.
>
> The program won't stop by itself unless something goes wrong.  You can
> kill it with ^C or more simply by unplugging the flash drive.  (If you
> want to be safe, make sure there are no mounted filesystems on the
> drive before running the test program.)
>
> If the hardware bug is detected, the kernel patch will print error
> messages to the system log.  For example, when I run the test on the
> Intel controller in this computer, I get:
>
> [  150.019441] usb-storage 3-8:1.0: disconnect by usbfs
> [  150.271190] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  150.591089] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  151.538560] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  151.857569] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  152.018886] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  152.179810] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 80008d00 00008d00
> [  153.211804] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  153.374497] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  153.770443] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 80008d00 00008d00
> [  154.247861] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 82008d80 00008d00
> [  154.566912] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 82008d80 00008d00
> [  155.359101] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  155.838132] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  156.791107] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 80008d00 00008d00
> [  157.267620] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 00008d00 80008d00
> [  159.252057] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 80008d00 00008d00
> [  159.886048] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 80008d00 00008d00
> [  160.206625] ehci-pci 0000:00:1d.7: EHCI hardware bug detected: 02008d80 80008d00
> ...
>
> You get the idea.  The values in the two columns on the right are
> always supposed to be equal; when they aren't it indicates that the
> controller has done a DMA write at a time when ehci-hcd isn't expecting
> one to happen.
>
> I'd be interested to hear the results of testing on a variety of
> controllers.  (This computer also has an NEC EHCI controller, and that
> one does not have the bug.)  Do the EHCI controllers on current Intel
> chipsets pass the test?  What about other vendors?
>
> Thanks to all who try it out and report their results.
Test on the Sandybridge platform.
At the first time, I get following output. But after that, I was
hard to get any output. And test on the v3.8.

sudo ./ehci-test /dev/bus/usb/001/003
[  140.855342] usb-storage 1-1.2:1.0: disconnect by usbfs
Invalid URB stat[  140.863000] ehci-pci 0000:00:1a.0: shutdown urb
ffff88014545f300 ep1in-bulk
[  140.871303] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f0c0 ep1in-bulk
[  140.878231] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545fcc0 ep1in-bulk
[  140.885158] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545fb40 ep1in-bulk
[  140.892088] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f9c0 ep1in-bulk
[  140.899015] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f780 ep1in-bulk
[  140.905941] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f240 ep1in-bulk
[  140.912870] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f900 ep1in-bulk
[  140.919799] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545fc00 ep1in-bulk
[  140.926725] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f540 ep1in-bulk
[  140.933655] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f3c0 ep1in-bulk
[  140.940583] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545fd80 ep1in-bulk
[  140.947512] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f600 ep1in-bulk
[  140.954440] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f180 ep1in-bulk
[  140.961368] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f000 ep1in-bulk
[  140.968297] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545fa80 ep1in-bulk
[  140.975223] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545f840 ep1in-bulk
us -32, act len [  140.982151] ehci-pci 0000:00:1a.0: shutdown urb
ffff88014545fe40 ep1in-bulk
[  140.990459] ehci-pci 0000:00:1a.0: shutdown urb ffff88014545ff00 ep1in-bulk
[  140.997388] ehci-pci 0000:00:1a.0: shutdown urb ffff880145f08000 ep1in-bulk
[  141.004316] ehci-pci 0000:00:1a.0: shutdown urb ffff880145f080c0 ep1in-bulk
[  141.011245] ehci-pci 0000:00:1a.0: shutdown urb ffff880145f08180 ep1in-bulk

>
> Alan Stern
>
>
>
>
> Index: usb-3.8/drivers/usb/host/ehci-q.c
> ===================================================================
> --- usb-3.8.orig/drivers/usb/host/ehci-q.c
> +++ usb-3.8/drivers/usb/host/ehci-q.c
> @@ -547,7 +547,7 @@ qh_completions (struct ehci_hcd *ehci, s
>         if (stopped != 0 || hw->hw_qtd_next == EHCI_LIST_END(ehci)) {
>                 switch (state) {
>                 case QH_STATE_IDLE:
> -                       qh_refresh(ehci, qh);
> +//                     qh_refresh(ehci, qh);
>                         break;
>                 case QH_STATE_LINKED:
>                         /* We won't refresh a QH that's linked (after the HC
> @@ -1232,6 +1232,7 @@ static void start_iaa_cycle(struct ehci_
>  static void end_unlink_async(struct ehci_hcd *ehci)
>  {
>         struct ehci_qh          *qh;
> +       __hc32                  tok1, tok2;
>
>         if (ehci->has_synopsys_hc_bug)
>                 ehci_writel(ehci, (u32) ehci->async->qh_dma,
> @@ -1242,6 +1243,7 @@ static void end_unlink_async(struct ehci
>         ehci->async_unlinking = true;
>         while (ehci->async_iaa) {
>                 qh = ehci->async_iaa;
> +               tok1 = ACCESS_ONCE(qh->hw->hw_token);
>                 ehci->async_iaa = qh->unlink_next;
>                 qh->unlink_next = NULL;
>
> @@ -1250,8 +1252,14 @@ static void end_unlink_async(struct ehci
>
>                 qh_completions(ehci, qh);
>                 if (!list_empty(&qh->qtd_list) &&
> -                               ehci->rh_state == EHCI_RH_RUNNING)
> +                               ehci->rh_state == EHCI_RH_RUNNING) {
> +                       udelay(10);
> +                       tok2 = ACCESS_ONCE(qh->hw->hw_token);
> +                       if (tok1 != tok2)
> +                               ehci_err(ehci, "EHCI hardware bug detected: %08x %08x\n",
> +                                               tok1, tok2);
>                         qh_link_async(ehci, qh);
> +               }
>                 disable_async(ehci);
>         }
>         ehci->async_unlinking = false;



-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux