Re: Control message failures kill entire XHCI stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26.01.2015 05:37, Devin Heitmueller wrote:
> Hi Mathias,
> 
> Here's an interesting development:  as a result of a related thread on
> linux-media, I came across a patch they are distributing in openelec:
> 
> https://github.com/OpenELEC/OpenELEC.tv/commit/b636927dec20652ff020e54ed7838a2e9be51e03
> 
> Now I'm not saying that reverting the commit in question is the
> "right" thing to do, but I applied this patch and for the first time
> in 100+ tests it started to work (i.e. I'm not seeing the XHCI hcd
> tear down all the attached devices).
> 
> Given what I've seen of the bug I cannot really explain why the
> scatter gather list sizes would have any bearing on TRBs for USB
> control messages to be added to the queue.  Perhaps we're hitting the
> upper bound of the list?  Any further speculation on my part would
> just make me look clueless...
> 
> It would be great if you could offer any insight as to why the patch
> in question could be responsible for the behavior we're seeing.  I
> would really rather not just blindly check this patch into my local
> tree and declare "victory" without understanding the underlying issue
> and whether it's likely to cause other problems.
> 

Hi

Thanks for digging this info out.

I'm starting to think maybe there could be something wrong in the ring expansion,
or ring memory management in general.

The failed control endpoint stopping was preceeded by a ring expansion, (in all the logs I got)
If the ring expansion already kills xhci it would explain why we never see the control message on the bus.

The patch you show increases the TRB_PER_SEGMENT size from 64 to 256, this again reduces the need for ring expansion
as each segment is already four times bigger initially.

Another thing pointing to ring memory management is the additional checks I added printing:

"Cancelled TD not on stopped ring"
"Cancel URB NOT on current ring"

Hitting these means that the the urb we want to cancel points to a segment that is no longer part of that endpoints ring.
But we still alter the memory of that segment. 
We cache, free, and re-use segments as endpoints are dropped and added, or rings expanded.
We might have randomly altered a segment that just got used by another ring after a ring expansion.

Atleast thats a new theory.
I need to do more hacking

-Mathias  


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux