On Fri, 1 May 2009, [utf-8] Rogério Brito wrote: > > It's not all that simple. The host controller allows the OS to set the > > number of hardware retries to 1, 2, 3, or unlimited. Linux uses 3; > > those XactErr debugging messages in your log show that the driver was > > extending the number of retries in software. > > Right. I didn't know that. Obviously, setting it to unlimited can give > undefined behavior of the computer. No, the behavior would be defined. But it wouldn't be what we want. Instead of getting an immediate error followed by a reset, you would have to wait for the command to time out (somewhere between 10 and 30 seconds) before the reset occurred. > > It's not possible to change the time interval between retries done by > > the hardware. While it is possible in theory to change the interval > > between retries done by the driver, it would be rather difficult and > > so ehci-hcd doesn't attempt it. > > Oh, what a pity. It seems that the device at hand sort of gets in shape > again after some time, since I have an automounter here and the device > nodes appear again under dev and it auto-mounts the device at the > appropriate mount point. Weird. There is probably a reset in between. I doubt that the device recovers all by itself. > > The software retries were introduced to solve one particular problem: > > Many EHCI controllers will generate a transaction error if a data > > transfer is occurring on one port at the same time as a device is > > being unplugged on another port. > > Right. I just got myself a (non powered) USB hub and I noticed one thing > (unrelated to this problem): if I plug a USB disk to this hub and, then, > plug a printer, very weird things happen, like the disc being unmounted > or things like that. That is different from what I was talking about. The Intel controllers in question work okay when a new device is plugged in, but they get errors when a device is unplugged. > > This is clearly a hardware bug, and the software retries were intended > > to work around it. In practice only a couple of software retries are > > needed; if the transfer hasn't succeeded by that point then it's never > > going to succeed. I set the upper limit to 32 retries just to be > > conservative. > > OK. Thanks for the nice and clear explanation of the problem. I only > wonder why I not seeing these errors on other machines while I *do* see > them on other machines (this one is an intel ICH5). Quality varies a lot with USB components, and sometimes you can't tell where the problem is. I've got a USB disk drive and cable that do not work on my home PC, although they do work on my office PC. If I use a different cable then the drive does work on the home PC. If I use the same cable but substitute a USB stick for the drive, again it works. So which component is bad: the home PC, the cable, or the drive? > > If transaction errors aren't caused by noise in the cable then they > > are almost always caused by bugs or failures in the device. > > I will try again with a shorter and newer cable. Let's see how that > works. BTW, is there any way to check the quality of a cable? I have a > multimeter here and I would be willing to do some extensive tests. > Testing the USB enclosure is also pretty feasible. I don't know any way to test these things without using some pretty fancy equipment. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html