On 30/08/2017 11:06, Greg Kroah-Hartman wrote: > On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: > >> On 30/08/2017 08:02, Greg Kroah-Hartman wrote: >> >>> To get back to the original issue here, the hardware seems to have died, >>> the driver stops talking to it, and all is good. The "regression" here >>> is that we now properly can determine that the hardware is crap. >> >> Before 4.12, when I unplugged my USB3 Flash drive, Linux would >> detect a few "Uncorrected Non-Fatal errors" via AER, but it was >> still possible to plug the drive back in. >> >> Since 4.12, once I unplug the drive, the whole USB3 card is marked >> as dead (all 4 ports), and I can no longer plug anything in (not even >> the USB2 drive that didn't have any issues, IIRC). >> >> It seems a bit premature to "mark as dead" something that remains >> functional, doesn't it? > > I agree, but if the device sends all ones, it's a good indication it is > really dead, right? Or something is wrong with it. I wouldn't call it dead if I can plug the drive back in, and have it working... But I agree that something fishy is happening... >> Disclaimer, there are many variables in this setup, and I've only >> tested a small fraction of the problem space: only one system, >> only one USB3 board, only one USB3 Flash drive. > > Did you ever happen to narrow this down to a single git commit using > 'git bisect'? I can't remember what happened in the beginning of this > thread... Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b >>> So, how do you think we should proceed, delay a bit longer before saying >>> the device is gone? How long is "long enough"? How many bus errors are >>> we allowed to tolerate (hint, the PCI spec says none...) >>> >>> Maybe someone wants to get to the root problem here, why is the hardware >>> suddenly reporting all 1s? >> >> I'm afraid I won't be able to make any progress on this front, >> unless I can get my hands on a PCIe packet analyzer. > > Odds of that happening are pretty rare, right? I've never even seen one > of those... I had a "Summit T24 Analyzer" on my desk a few months ago, but I was getting strange results, and the knowledgeable people in my company were not available at the time. http://teledynelecroy.com/protocolanalyzer/protocoloverview.aspx?seriesid=445 Regards.