Re: Testing for hardware bug in EHCI controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Alan this is VoRTeX from the linux kernel ata wiki, greetings
from the occupied Greece...Nick uses UNIX since 2003 (FreeBSD back
then...5.2.1) and after all these years
is an expert in Linux and in HDDs. In 2008 a repeated silent data
corruption occurred at a specific USB2SATA enclosure that we still
have. Three copies produced three different
sums of large data image. It could be the libata, the kernel USB code,
it could be the SB700 USB (from the quite troublesome ASUS M3A78 PRO)
or WTF...

We are monitoring the libata and USB list for some years and you are
doing an amazing job and with respectable continuity and endurance,
keep up the good work. You are granted access to ALL of our hardware
but our time is limited.This means a lot of old x86 boards and all
kinds of cards.

Initial testing shows that the underlying bug is found in many
chipsets. Once we are done with testing please notify Intel and AMD
officially so that they update their southbridge specification updates
and errata respectively...

Unfortunately the program needs improvements. To begin with a single
USB 2.0 128 MB! stick was used on all tests. VT8235M does not have the
problem but errors were encountered prior detecting the stick. After
some removals and insertions it worked but going from 100 to 1000 took
considerably more time than on all other hosts. It might good to add
version and time printing at 100, 200 and so on. Program execution on
non broken hosts in unclear, for how many iterations should it run?
Intel ICH5 chipsets affected by the bug, start making a buzzing noise
when the program runs! Also assuming that 3.8.2 has as1617 applied, we
probably fell into another hardware bug while testing a moschip PCIe
EHCI controller and a SB850, as the program aborts saying:
"Block count is too large
Block count is too small: 513"
, meaning that something got shifted from one block to another?, go figure!

Lastly on A50M dmesg prints "EHCI hardware bug detected:"... and the program:
"URB timed out; bug may be present"
"Wrong URB completed"
so why on intel hosts only dmesg proves bug presence and not the
program? Did it need more time or something else is there?

Should you need more info, ask it in the mailing list and you shall
receive it. If you want to improve and correct the program (as it
cannot run everywhere as is)we suggest you to move it under a new
subject on the USB list so that we are all running the latest and
greatest. If you do not agree with the improvements then we will send
you details on the failed southbridges including SL specs or revisions
but only on systems that where able to run your program. We have also
tested two out of our three USB 3.0 controllers as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux