Re: [RFC v3 7/8] xHCI: wait for dequeue pointer move pass link TRB

Andiry Xu <andiry.xu@xxxxxxx> · Mon, 12 Dec 2011 16:33:52 +0800

On 12/10/2011 02:43 AM, Sarah Sharp wrote:
> On Fri, Dec 09, 2011 at 05:32:57PM +0800, Andiry Xu wrote:
>> On 11/11/2011 04:36 AM, Sarah Sharp wrote:
>>> On Thu, Nov 10, 2011 at 05:12:04PM +0800, Andiry Xu wrote:
>>>> When trying to expand a ring, if enqueue pointer is behind dequeue pointer
>>>> and they're in the same segment, we can not expand the ring until the
>>>> dequeue pointer move pass the link TRB. This is because if the hardware
>>>> has not cached the link TRB, it will jump to the new segments, but the
>>>> data there is not right and the hardware may stop if the cycle_bit
>>>> indicates it's software owned.
>>>>
>>>> Wait for the dequeue pointer move pass the link TRB, and then it's safe
>>>> to expand the ring. Pending all the urbs submitted during the waiting period
>>>> and queue them after the ring successfully expanded. When an endpoint is
>>>> stopped, clear all the pending urbs.
>>>
>>> I'm not sure why you're giving back pending URBs when the endpoint
>>> stops.  There's no reason that canceling a URB that has made it onto
>>> the ring should suddenly cancel all the URBs waiting to get on the ring.
>>> Only the watchdog timer running should cause all the pending URBs to get
>>> given back.
>>>
>>> What I wanted you to do was check in urb_dequeue whether the URB to be
>>> canceled was in the pending queue and just take it out of the pending
>>> queue.  That you wouldn't have to even issue the stop endpoint command.
>>> You're going to have to decrement the pending_num_trbs when you take it
>>> out of the list, of course.
>>>
>>> Also, why do you need both pending_num_trbs and waiting_deq_ptr?  When
>>> checking to see if the ring needs to be expanded, why not just check if
>>> the TRB count is non-zero or if the pending_urb list is empty?
>>>
>>> I think there might be a couple other subtle issues, but I need to apply
>>> your patchset and check.
>>>
>>
>> After re-examine this patch, I found there is another design issue. The
>> mechanism when the special case (waiting for host walk around the link
>> TRB) triggered is:
>>
>> 1. Hold the urbs in a list, report to usbcore that they're successfully
>> submitted.
>> 2. Waiting for the host walk around the link TRB.
>> 3. Expand the ring and queue the pending urbs.
>>
>> However, we can not guarantee step3 will succeed: xhci_ring_expansion()
>> may fail with low memory, the endpoint state may change, and queueing of
>> the urbs may fail. So usbcore gets the report that urb is submitted
>> successfully, while it's actually failed. I don't think this is good. I
>> may re-consider the design.
> 
> If xhci_ring_expansion() fails because of any of those conditions, then
> you should probably to through the list of pending URBs and give them
> back with a suitable error message.
> 
> Or, for every URB that you need to allocate another ring segment for,
> allocate as many segments as you'll need in urb_enqueue, and store them
> in a segment cache in the xHCI ring structure.  When another URB comes
> in, take into account how many free TRBs you have in the segment cache,
> how many TRBs are needed in the current expansion queue, and add more to
> the cache if necessary.
> 
> I think the easier solution is to just call usb_giveback_urb() with an
> error code when you fail to expand the ring.  There's nothing wrong with
> that.  All it means when you return from urb_enqueue with a successful
> status is that the URB is queued.  Whether that's in a hardware queue or
> a software queue doesn't matter.  Remember, even the xHCI hardware can
> give a missed service event because it couldn't fetch memory fast
> enough.  That's sort of similar to what's happening here: we can't get
> memory for the ring fast enough, so the URBs are given back to the
> driver with an error code.
> 

Ah yes, that's right. When the expansion fails, just giveback all the
pending urbs should do.

>> Can I hold the patch temporarily and submit the remaining patchset by
>> now? It should work without the patch since this is a case very hard to
>> trigger. As more and more guys encounter the ring full issue, I think
>> the patchset should be merged. Sorry for the delay.
> 
> You want to merge patches 1-6 and patch 8 (the sgtable update), correct?
> 
> Yes, I think that would work.  The ring expansion will just fail
> occasionally without this patch, which drivers should be able to deal
> with.  I'll do some testing on the first couple of patches and get it
> off to Greg next week.
> 

Thanks. There is another bug in the last patchset (in patch3), I'll
resubmit the patchset later. Also I'll resubmit patch7 and move it to
the last position, it's up to you to take it or not.

Thanks,
Andiry

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html