On Mon, Oct 09, 2017 at 10:45:39PM +0200, Mason wrote: > On 09/10/2017 19:01, Bjorn Helgaas wrote: > ... > > In that thread, Mason reported a regression that looks similar, but as > > far as I can tell, we never identified a root cause. > > > > 1) The problem Mason reported was on a Tango platform, which has a > > known hardware issue that corrupts data when simultaneous config > > and MMIO accesses occur. You're seeing the problem on a > > different platform, which is very helpful. > > As mentioned here: > https://www.mail-archive.com/linux-usb@xxxxxxxxxxxxxxx/msg94020.html > > When I disable the AER driver, not a single config space access > occurs when a USB drive is unplugged. So I'm 99.99% sure that > the issue is NOT caused by tango's bad design. (I got the vibe > that nobody cared about tango's issue because it was assumed > that the design flaw was responsible for it.) I agree; I don't think this is Tango's fault. Can you test fe190ed0d602 and d9f11ba9f107 to determine whether d9f11ba9f107 is the culprit? If it is the culprit, can you try reverting it on a current kernel to see if that fixes it? If d9f11ba9f107 is not the culprit, can you bisect to discover exactly where it broke? > > 2) Mathias suggested d9f11ba9f107 ("xhci: Rework how we handle > > unresponsive or hoptlug removed hosts"), which appeared in > > v4.12-rc1, as a possible culprit, but I don't see a bisection > > that definitively identifies this commit. > > > > Is it possible for you to test both fe190ed0d602 ("xhci: Do not > > halt the host until both HCD have disconnected their devices.") > > and d9f11ba9f107 ("xhci: Rework how we handle unresponsive or > > hoptlug removed hosts") so we can tell for sure whether > > d9f11ba9f107 broke it?