Re: Testing for hardware bug in EHCI controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 11 Mar 2013, Noone Nowhere wrote:

> Hello Alan this is VoRTeX from the linux kernel ata wiki, greetings
> from the occupied Greece...Nick uses UNIX since 2003 (FreeBSD back
> then...5.2.1) and after all these years
> is an expert in Linux and in HDDs. In 2008 a repeated silent data
> corruption occurred at a specific USB2SATA enclosure that we still
> have. Three copies produced three different
> sums of large data image. It could be the libata, the kernel USB code,
> it could be the SB700 USB (from the quite troublesome ASUS M3A78 PRO)
> or WTF...
> 
> We are monitoring the libata and USB list for some years and you are
> doing an amazing job and with respectable continuity and endurance,
> keep up the good work. You are granted access to ALL of our hardware
> but our time is limited.This means a lot of old x86 boards and all
> kinds of cards.
> 
> Initial testing shows that the underlying bug is found in many
> chipsets. Once we are done with testing please notify Intel and AMD
> officially so that they update their southbridge specification updates
> and errata respectively...

Intel probably doesn't care, because they don't make the affected 
chipsets any more.  I don't know what AMD is currently making.  Do you?

> Unfortunately the program needs improvements. To begin with a single
> USB 2.0 128 MB! stick was used on all tests. VT8235M does not have the
> problem but errors were encountered prior detecting the stick. After
> some removals and insertions it worked but going from 100 to 1000 took
> considerably more time than on all other hosts. It might good to add
> version and time printing at 100, 200 and so on. Program execution on
> non broken hosts in unclear, for how many iterations should it run?

I don't know.  In my experience, 1000 iterations have been enough for 
the bug to show up.  But I have tested on only two computers.

> Intel ICH5 chipsets affected by the bug, start making a buzzing noise
> when the program runs! Also assuming that 3.8.2 has as1617 applied, we
> probably fell into another hardware bug while testing a moschip PCIe
> EHCI controller and a SB850, as the program aborts saying:
> "Block count is too large
> Block count is too small: 513"
> , meaning that something got shifted from one block to another?, go figure!
> 
> Lastly on A50M dmesg prints "EHCI hardware bug detected:"... and the program:

What is A50M?

> "URB timed out; bug may be present"
> "Wrong URB completed"
> so why on intel hosts only dmesg proves bug presence and not the
> program? Did it need more time or something else is there?

The program does not detect the bug; it merely creates conditions where
the bug is likely to show up.  The kernel driver detects the bug, when
it occurs.

> Should you need more info, ask it in the mailing list and you shall
> receive it. If you want to improve and correct the program (as it
> cannot run everywhere as is)we suggest you to move it under a new
> subject on the USB list so that we are all running the latest and
> greatest. If you do not agree with the improvements then we will send
> you details on the failed southbridges including SL specs or revisions
> but only on systems that where able to run your program. We have also
> tested two out of our three USB 3.0 controllers as well.

What improvements do you suggest?  This wasn't clear from what you 
wrote above.  Just the increments by 100 instead of by 1000?

Yes, please do send information on the failed controllers.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux