Alan Stern <stern@...> writes: > > On Wed, 29 Aug 2012, Gary E. Miller wrote: > > > Yo Alan! > > > > On Wed, 29 Aug 2012 17:08:43 -0400 (EDT) > > Alan Stern <stern@...> wrote: > > > > > > Uh, not easy. This is a production machine. Acceptable downtime is > > > > very small. There should be a way to turn it off at runtime. > > > > > > No, there shouldn't. It's a debugging feature; while debugging we > > > want to see all occurrences of these messages. > > > > > > Production machines should not run debugging kernels. > > > > We'll have to agree to disagree. My experience is the fun bugs only happen > > in production. A big NASA study years ago proved that after spending $1M > > per line of code that only half the bugs could be caught before live > > missions. > > Well, put it this way: Production machines should not run debugging > kernels unless you're actually trying to debug something. For normal > operation, a debugging kernel should not be used. > > Remember, the purpose of a debugging kernel is not to provoke bugs or > notify you that the bugs exist. Normal kernels do these things just > fine. The purpose is to help find the cause of bugs. > > Alan Stern > > I believe the focus of this thread has gone in wrong direction. The question should not be about how to disable the flood of "detected XactErr" messages, but (a) why does it occur in this particular case, and (b) why there are so many of them. My opinion here is: (a) The fact of appearance of "detected XactErr" message indicates some BRUTAL CONDITION with communication pipe on the particular USB port. This message is generated after the host hardware was unable to receive proper protocol response from a device after THREE BACK-To-BACK ATTEMPTS (assuming CERR in EHCI.c is set to 3). This means that this situation is not some accidental signal glitch but a fatal condition on the pipe. In other words, the USB device has likely lost its configuration, or went dead. Therefore, the host should not re-try this low-level transaction, and rather resort to some higher-level recovery procedure (port reset and re-enumeration). Thus, we are coming to (b): (b) Instead of switching to recovery, the Linux USB driver attempts 32 additional re-tries. As explained in (a), these retries serve no purpose, except they generate really alarming debug logs that would be impossible to miss. Sorry to reviving 2-years-old thread. My problem with Linux USB stack is why it is doing extra 32 attempts to a dead link. What is the rationale behind this 32-times "recovery policy"? Thanks, Alexei -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html