Re: xHCI bug

Felipe Balbi <balbi@xxxxxx> · Thu, 6 Nov 2014 10:36:30 -0600

On Thu, Nov 06, 2014 at 06:31:20PM +0200, Mathias Nyman wrote:
> On 05.11.2014 21:28, Felipe Balbi wrote:
> > Hi,
> > 
> > On Tue, Oct 14, 2014 at 04:34:00PM +0300, Mathias Nyman wrote:
> >>>>> Could you try with xhci debugging enabled? (will probably produce a
> >>>>> lot of output)
> >>>>>
> >>>>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
> >>>>
> >>>> I'll try, sure.
> >>>
> >>> I used tracing otherwise the problem wouldn't show up. Attached you can
> >>> find output:
> >>>
> >>> 0b7e070de7b65de9f70805f4639b3e58  xhci-timeout-testusb.txt.gz
> >>>
> >>
> >> Thanks, looks like we end up calling cleanup_halted_endpoint()  a lot.
> >> This will (try to) reset the endpoint and move to handle the next TD (URB).
> >>
> >> This is called when we're processing contorl transfers and something out of the ordinary happends (returned STALL, BABBLE, and some other reasons)
> >>
> >> I need to dig a bit deeper to know what actually is going on. 
> > 
> > any news here ? It's been almost a month.
> > 
> 
> While looking at this and other bugs I found races between reset endpoint, reset device, and set dequeue pointer commands. 
> I suspect the loop in your logs is due to starting the endpoint ring too early after reset. It restarts before we move
> past the problematic TD, and start executing it again.
> 
> The logs don't show why the TD fails in the first place, but I got another patch fixing other race issues which might help.
> 
> Both patches are now in a "reset-rework" topic branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git reset-rework
> 
> Its based on 3.18-rc2.
> I haven't still got or set up a usb device with gadget zero to test it out myself

I'll try to run it today or tomorrow.

-- 
balbi
Attachment:
signature.asc

Description: Digital signature