Re: XHCI: Ring expansion failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 6, 2013 at 12:33 PM,  <hemantk@xxxxxxxxxxxxxx> wrote:
>> On Wed, Nov 6, 2013 at 12:53 AM,  <hemantk@xxxxxxxxxxxxxx> wrote:
>>> Hi
>>>
>>> By performing iterative port suspend and resume (which results in
>>> function
>>> suspend and resume), ring expansion failure is observed. Attached device
>>> has multiple interfaces for which interface host drivers are unlinking
>>> the
>>> urbs during function suspend and submitting urbs during resume.
>>>
>>> For the cases when dequeue ptr is close to the end of deq segment, value
>>> of num_trbs_in_deq_seg (= ring->dequeue - ring->deq_seg->trbs) becomes
>>> high when function suspend is over. At the time of resume this high
>>> value
>>> is causing to trigger ring expansion in the middle of submitting
>>> urbs(ring->num_trbs_free < num_trbs + num_trbs_in_deq_seg).  All the
>>> interface host drivers only request one TRB per urb (num_trb =1). Ring
>>> expansion logic is doubling the num of previously allocated segments
>>> every
>>> time. This is causing the num of segments to grow at fast pace (at the
>>> time of failure, one of the ep rings num_seg was 128), even though it is
>>> also increasing num_trbs_free.
>>>
>>
>> Sounds like a bug. Why num_trbs_in_deq_seg becomes high? How many urbs
>> are submitted during resume? Normally only isoc transfer triggers ring
>> expansion with multiple trbs per urb.
>
> Let me explain it little further what is being observed in terms of enq ,
> deq ptr:-
>
> For example
> 1)      Let’s say First seg of the ring starts with 0xed0d3000 which means enq
> ptr and deq ptr are pointing to this address.
> 2)      Interface driver queuing 10 urbs (one urb represents one trb) enq ptr
> goes to ed0d30a0.
> 3)      Interface suspend happened. All 10 urbs are going to get unlinked. For
> the first urb to be unlinked, it is added to ep->cancelled_td_list  and
> Stop ep cmd was called (rest of the urbs added to ep->cancelled_td_list
> before stop ep cmd completed).
> 4)      When stop ep cmd is completed, for 9 urbs td_to_noop() is called.  For
> the first urb (for which stop ep cmd was called),
> xhci_find_new_dequeue_state() gets called which updates deq ptr to next
> trb (0x ed0d3010) so set tr deq ptr cmd is issued to xhci to set its deq
> ptr to next trb(0xed0d3010).
> 5)      In set tr deq cmd completion handler
> update_ring_for_set_deq_completion() gets called which will just
> increments num_trbs_free to only one, but we unlinked all the urbs. So
> when interface suspend finished we have enq ptr pointing to 0x ed0d30a0
> and dep ptr pointing to 0x ed0d3010. With num_trbs_free only incremented
> by one.
> 6)      At the time of interface resume we re-queue 10 urbs back with
> num_trbs_free (just incremented 1 during suspend). This means
> num_trbs_free is incrementing very slowly every time interface suspend
> happens. And it is very easy to get into a situation where room_on_ring()
> returning 0 as deq ptr will be reaching to the end of the seg and causing
> num_trbs_in_deq_seg to go high enough that ring->num_trbs_free < num_trbs
> + num_trbs_in_deq_seg condition becomes true.
>

So it's because the suspend/resume loops and urbs are enqueued and
unlinked, deq_ptr does not follow enq_ptr. This is expected since
deq_ptr and num_trbs_free increase in inc_deq(), which is called by
irq handler. As there is no transfer on this ring, deq_ptr will not
get updated.

I'm still curious about the failure of ring expansion. 128 segments
occupy 128KB, which should not be a big deal. Ring expansion is
expected to run until there is no memory to allocate. Does it fail
because of allocation failure?

Thanks,
Andiry

>
>>> Even for the cases where we observe data transfer as well as suspend &
>>> resume (which are normal scenarios) ring expansion failure is easily
>>> observed due to large number of segments being allocated for ever. This
>>> means triggering of ring expansion will result in failure of further
>>> expansion at some point of time.
>>>
>>> I was trying to think of some options to reduce ring expansion with
>>> respect to suspend resume scenario: -
>>> 1)      Can we reset enq and deq ptr to point to ep_ring first seg once
>>> all the
>>> urbs are unlinked (function suspend)?
>>> 2)      Can we free the segments at the time of suspend and allocate
>>> segments
>>> back when resume happens?
>>>
>>
>> These does not point to the root cause of ring expansion failure.
>>
>> Thanks,
>> Andiry
>>
>>> In general, as per the spec can we also implement shrinking of transfer
>>> ring (4.9.2.4).
>>>
>>> Please help to provide your comments on the options above (which can
>>> easily fit in to the current design) also feel free to add better
>>> options
>>> to consider to avoid ring expansion failures which can be added easily
>>> to
>>> the current implementation.
>>>
>>> Currently I am using 3.10 kernel and back ported recent fixes on xhci.
>>>
>>> Thanks,
>>> Hemant
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux