Re: XHCI: Ring expansion failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Wed, Nov 06, 2013 at 11:09:50PM -0000, hemantk@xxxxxxxxxxxxxx wrote:
>> >> On Wed, Nov 6, 2013 at 12:33 PM,  <hemantk@xxxxxxxxxxxxxx> wrote:
>> >>>> On Wed, Nov 6, 2013 at 12:53 AM,  <hemantk@xxxxxxxxxxxxxx> wrote:
>> >>>>> Hi
>
> Hi Hemant,
>
>> >>>>> By performing iterative port suspend and resume (which results
>> >>>>> in function suspend and resume), ring expansion failure is
>> >>>>> observed. Attached device has multiple interfaces for which
>> >>>>> interface host drivers are unlinking the urbs during function
>> >>>>> suspend and submitting urbs during resume.
>> >>>>>
>> >>>>> For the cases when dequeue ptr is close to the end of deq
>> >>>>> segment, value of num_trbs_in_deq_seg (= ring->dequeue -
>> >>>>> ring->deq_seg->trbs) becomes high when function suspend is over.
>> >>>>> At the time of resume this high value is causing to trigger ring
>> >>>>> expansion in the middle of submitting urbs(ring->num_trbs_free <
>> >>>>> num_trbs + num_trbs_in_deq_seg).  All the interface host drivers
>> >>>>> only request one TRB per urb (num_trb =1).  Ring expansion logic
>> >>>>> is doubling the num of previously allocated segments every time.
>> >>>>> This is causing the num of segments to grow at fast pace (at the
>> >>>>> time of failure, one of the ep rings num_seg was 128), even
>> >>>>> though it is also increasing num_trbs_free.
>
> Sorry for taking so long to respond to this.  It's definitely something
> we need to look into.
>
> Which type of USB device caused this out-of-control ring expansion?  Can
> you provide us a link for where to buy it, if it's a specific device?
> It would make it easier for us to test this.
No it is not device specific issue. I continued debugging this issue
further after writing to linux-usb. I am hitting the issue easily on
interrupt IN ep ring. I was able to get to the point due to which issue is
happening. I have a driver which is queuing 17 urb on interrupt IN ep and
when suspend happens it kills one urb at a time which means this results
into 17 stop ep cmds which gets finished one after the other. I added some
dmesg logs for this ep to print SW deq ptr and also the deq ptr xHC is on
in handle_tx_event which will be called as a result of issuing stop ep
command. For good cases i see  that xHC was able to skip though the No-OP
TRBs when stop ep cmd was issued for urbs :-

   21.700565] handle_tx_event: Stopped on No-op or Link TRB
[   21.705895] handle_tx_event NO OP EVT TRB: ed0da4c0 ep_ring->enq =
ed0da5a0 ep_ring->deq = ed0da4c0 ep_ring->num_trbs_free 111

[   21.717272] handle_stopped_endpoint: MARKED NO-OP TRB: ed0da4d0 curr td
= e9dbd080 ep->stopped_td =   (null) ep_ring->enq = ed0da5a0 ep_ring->deq
= ed0da4c0 ep_ring->num_trbs_free 111
[   21.733648] handle_tx_event: Stopped on No-op or Link TRB
[   21.738968] handle_tx_event NO OP EVT TRB: ed0da4d0 ep_ring->enq =
ed0da5a0 ep_ring->deq = ed0da4c0 ep_ring->num_trbs_free 111

Here you can see that TRB @  0xed0da4c0 was marked as NO-OP and xHC deq
ptr is on that NO-OP TRB. For the next stop ep cmd which marked the TRB
0xed0da4d0 as No-OP when door bell was rung handle_tx_event happened for
0xed0da4d0 which means controller is skipping though the No-OP TRBs on the
way. Also, with controller skipping TRBs there could be a situation when
TD that we are trying to cancel, xHC deq ptr is on corresponding TRB which
will cause set TR deq cmd to get executed.


In bad case i am observing that xHC deq ptr is stuck to a No-OP TRB
(0xed0da420) and for all the subsequent stop ep cmd it is generating
handle_tx_event for same No-OP TRB like this (pasting few of the logs) :-

[   23.126453] handle_tx_event: Stopped on No-op or Link TRB
[   23.130811] handle_tx_event NO OP EVT TRB: ed0da420 ep_ring->enq =
ed0da550 ep_ring->deq = ed0da420 ep_ring->num_trbs_free 106

[   23.142185] handle_stopped_endpoint: MARKED NO-OP TRB: ed0da440 curr td
= e9d9d400 ep->stopped_td =   (null) ep_ring->enq = ed0da550 ep_ring->deq
= ed0da420 ep_ring->num_trbs_free 106
[   23.158569] handle_tx_event: Stopped on No-op or Link TRB
[   23.163883] handle_tx_event NO OP EVT TRB: ed0da420 ep_ring->enq =
ed0da550 ep_ring->deq = ed0da420 ep_ring->num_trbs_free 106

[   23.175255] handle_stopped_endpoint: MARKED NO-OP TRB: ed0da450 curr td
= e9d9d300 ep->stopped_td =   (null) ep_ring->enq = ed0da550 ep_ring->deq
= ed0da420 ep_ring->num_trbs_free 106
[   23.191698] handle_tx_event: Stopped on No-op or Link TRB
[   23.196955] handle_tx_event NO OP EVT TRB: ed0da420 ep_ring->enq =
ed0da550 ep_ring->deq = ed0da420 ep_ring->num_trbs_free 106

[   23.208329] handle_stopped_endpoint: MARKED NO-OP TRB: ed0da460 curr td
= e9d9d200 ep->stopped_td =   (null) ep_ring->enq = ed0da550 ep_ring->deq
= ed0da420 ep_ring->num_trbs_free 106

Here you can see  that all the times handle_tx_event is generated for same
NO-OP TRB. So controller is not skipping through the NO-OP TRBs which are
getting marked here (ed0da440, ed0da450, ed0da460) even after door bell
was rung. This causes all the TRBs to become No-OP TRBs when all  the URBs
are cancelled. Now there is no way set tr deq ptr cmd will be called and
controller remains stuck to the same No-OP TRB.
After this when resume happens and driver queues the 17 urbs on same ring
and rings the doorbell (lets assume no data transfer happened) and then
function suspend happens which will issue stop ep cmd for 17 urbs and
again we can see handle_tx_event getting called and showing that xHC HW
deq ptr didn't move and still on same No-OP TRB:-

23.976402] handle_tx_event: Stopped on No-op or Link TRB
[   23.980760] handle_tx_event NO OP EVT TRB: ed0da420 ep_ring->enq =
ed0da660 ep_ring->deq = ed0da420 ep_ring->num_trbs_free 89
[   23.992047] handle_stopped_endpoint: MARKED NO-OP TRB: ed0da550 curr td
= e9d9d480 ep->stopped_td =   (null) ep_ring->enq = ed0da660 ep_ring->deq
= ed0da420 ep_ring->num_trbs_free 89
[   24.008373] handle_tx_event: Stopped on No-op or Link TRB
[   24.013658] handle_tx_event NO OP EVT TRB: ed0da420 ep_ring->enq =
ed0da660 ep_ring->deq = ed0da420 ep_ring->num_trbs_free 89

As a result we keep on marking all TRBs to No-OP and not calling set tr
deq ptr at all. This ultimately results in frequent ring expansions and
eventually causes ring expansion failure.

I am still not able to figure out why suddenly xHC hw deq ptr gets stuck
to a NO-OP TRB and never moves further.

I have tried workaround for this issue to zero out the entire ring and
reset SW enq and deq ptr to the start of the first seg of the ring and
issuing set tr deq ptr to the same TRB pointed by SW deq ptr just before
issuing the device suspend. I am doing this only for the rings for which
SW enq and deq ptr are not same. This is helping me to get rid of the xHC
deq ptr stuck problem as i am forcing the controller to move to the start
of the ring. But there is a corner case which is not covered with this
workaround.If one the interfaces is aborting the suspend for some reason
multiple times and bus suspend never happened. So driver who unlinked the
urb and get some of the urbs marked NO-OP may run into same issue as my
workaround is part of the bus suspend just before xhci_stop_device() gets
called.

Have you experienced this kind of issue with any xhci controller ? Please
let me know your thoughts.
>
>> >>>> Sounds like a bug. Why num_trbs_in_deq_seg becomes high? How many
>> urbs
>> >>>> are submitted during resume? Normally only isoc transfer triggers
>> ring
>> >>>> expansion with multiple trbs per urb.
>> >>>
>> >>> Let me explain it little further what is being observed in terms
>> >>> of enq , deq ptr:-
>> >>>
>> >>> For example
>> >>> 1)      Let's say First seg of the ring starts with 0xed0d3000
>> >>> which means enq ptr and deq ptr are pointing to this address.
>> >>> 2)      Interface driver queuing 10 urbs (one urb represents one
>> >>> trb) enq ptr goes to ed0d30a0.
>> >>> 3)      Interface suspend happened. All 10 urbs are going to get
>> >>> unlinked. For the first urb to be unlinked, it is added to
>> >>> ep->cancelled_td_list  and Stop ep cmd was called (rest of the
>> >>> urbs added to ep->cancelled_td_list before stop ep cmd completed).
>> >>> 4)      When stop ep cmd is completed, for 9 urbs td_to_noop() is
>> >>> called.  For the first urb (for which stop ep cmd was called),
>> >>> xhci_find_new_dequeue_state() gets called which updates deq ptr to
>> >>> next trb (0x ed0d3010) so set tr deq ptr cmd is issued to xhci to
>> >>> set its deq ptr to next trb(0xed0d3010).
>> >>> 5)      In set tr deq cmd completion handler
>> >>> update_ring_for_set_deq_completion() gets called which will just
>> >>> increments num_trbs_free to only one, but we unlinked all the
>> >>> urbs. So when interface suspend finished we have enq ptr pointing
>> >>> to 0x ed0d30a0 and dep ptr pointing to 0x ed0d3010. With
>> >>> num_trbs_free only incremented by one.
>> >>> 6)      At the time of interface resume we re-queue 10 urbs back
>> >>> with num_trbs_free (just incremented 1 during suspend). This means
>> >>> num_trbs_free is incrementing very slowly every time interface
>> >>> suspend happens. And it is very easy to get into a situation where
>> >>> room_on_ring() returning 0 as deq ptr will be reaching to the end
>> >>> of the seg and causing num_trbs_in_deq_seg to go high enough that
>> >>> ring->num_trbs_free < num_trbs + num_trbs_in_deq_seg condition
>> >>> becomes true.
>> >>>
>> >>
>> >> So it's because the suspend/resume loops and urbs are enqueued and
>> >> unlinked, deq_ptr does not follow enq_ptr. This is expected since
>> >> deq_ptr and num_trbs_free increase in inc_deq(), which is called by
>> >> irq handler. As there is no transfer on this ring, deq_ptr will not
>> >> get updated.
>
> Yep, I think this is the root cause.  If we're just queuing and then
> canceling URBs, the no-op TRBs get executed, but we don't get an event
> back for them, so the dequeue pointer remains on the first canceled TRB.
>
>> > Considering the suspend resume scenario, with this behavior don't you
>> > think ring expansion will be triggered very often and for no reason
>> (as we
>> > are just doing suspend and resume).
>> >
>> > In handle_stopped_endpoint() when traversing to ep->cancelled_td_list
>> is
>> > it better to call xhci_find_new_dequeue_state() for last td and call
>> > td_to_noop() for all previous tds. Doing that we will update deq ptr
>> to
>> > set to enq ptr with correct value of num_trbs_free when interface
>> suspends
>> > finishes.
>
> There's no guarantee that the last URB queued for cancellation is the
> last TD physically in the ring.  The URBs could be queued for
> cancellation in any order, and thus go on to cancelled_td_list in any
> order.
>
> We would also have issues using this solution for endpoints with streams
> enabled, where there are multiple stream rings associated with one
> endpoint.  The current code assumes that the host controller has stopped
> on one particular endpoint ring, which may need a Set TR dequeue
> command, and other rings will only have TDs turned into no-op TRBs.
> Otherwise the current code that stores the dequeue in a local pointer
> and issues one command to move the dequeue pointer won't work.
>
>> i think instead of what suggested above, we need to somehow update
>> num_trbs_free to the correct value any time before interface driver is
>> going to submit the first urb as a result of interface resume. As after
>> resume xhci is going to skip no op trbs and starts fetching the transfer
>> trbs on the ring.
>
> That doesn't help the case where the driver is canceling URBs as part of
> normal operation, not suspend and resume.  I would rather fix this for
> the generic case than focus on suspend in particular.
>
>> >> I'm still curious about the failure of ring expansion. 128 segments
>> >> occupy 128KB, which should not be a big deal. Ring expansion is
>> >> expected to run until there is no memory to allocate. Does it fail
>> >> because of allocation failure?
>> > yes, xhci_ring_expansion() returns -ENOMEM. There are 19 rings
>> allocated.
>> >>>
>> >>>>> Even for the cases where we observe data transfer as well as
>> >>>>> suspend & resume (which are normal scenarios) ring expansion
>> >>>>> failure is easily observed due to large number of segments being
>> >>>>> allocated for ever.  This means triggering of ring expansion
>> >>>>> will result in failure of further expansion at some point of
>> >>>>> time.
>> >>>>>
>> >>>>> I was trying to think of some options to reduce ring expansion
>> >>>>> with respect to suspend resume scenario: -
>> >>>>> 1)      Can we reset enq and deq ptr to point to ep_ring first
>> >>>>> seg once all the urbs are unlinked (function suspend)?
>
> You would have to re-initialize the rings as well.  Otherwise past TDs
> would be executed.  But again, I would like to address this issue
> generically, and not make some special case for suspend.
>
>> >>>>> 2)      Can we free the segments at the time of suspend and
>> >>>>> allocate segments back when resume happens?
>> >>>>>
>> >>>>> In general, as per the spec can we also implement shrinking of
>> >>>>> transfer
>> >>>>> ring (4.9.2.4).
>> >>>>>
>> >>>>> Please help to provide your comments on the options above (which
>> can
>> >>>>> easily fit in to the current design) also feel free to add better
>> >>>>> options
>> >>>>> to consider to avoid ring expansion failures which can be added
>> >>>>> easily
>> >>>>> to
>> >>>>> the current implementation.
>
> I think there's a couple of ways we could fix this.
>
> One would be to set the IOC flag on the last no-op TRBs in a TD we're
> trying to cancel.  I think that will make the event handling code
> increment the dequeue pointer and update the number of free TRBs on the
> ring (or the code could be changed to do so).  That would take care of
> the num_trbs_free counting issue for URB cancellation that isn't part of
> device suspend.
>
> A second option would be to make the cancellation code smarter, and move
> the dequeue command past the last canceled TRB that was turned into a
> no-op TRB.  The code would have to be smart to handle cases where
> there's no-op TRBs interspersed with transfer TRBs.  For example, when
> the driver queues URBs 1, 2, and 3 to the ring, but only cancels URBs 1
> and 3.  Possibly once all TDs are turned into no-ops, we could just walk
> the ring from the dequeue pointer, stop on the first normal TRB, and
> then set the dequeue pointer to that.  The code would have to make sure
> to do this only for endpoints that aren't enabled for streams.
>
> A third option would be to implement some new code as a host controller
> callback for device suspend/resume.  Right now, we really should be
> issuing a Stop Endpoint command with the suspend bit set for each device
> in the tree, but we only do so for roothub ports when the bus is
> suspended.  We know all URBs need to be canceled before the device is
> suspended, so we could do something like issue a Stop Endpoint command
> for each endpoint ring, and then shrink the ring back down to two
> segments and re-initialize them.  The device resume hook would need to
> issue a Set TR dequeue pointer for the rings.  The code would have to
> handle stream endpoint rings as well, which makes it much more complex.
>
> Option 1 is the simplest to implement, I think.  However, it does mean
> that the host controller has to generate a lot of events, which some
> hosts may have issues with.
>
> Option 2 is medium level complexity, but it has a nice side benefit.
> Right now, no-op TRBs on a periodic endpoint mean the host should skip
> that service interval.  Since the endpoint ring is still running when
> the device is suspended, I think that means the host will skip those
> intervals while the device is suspended.  However, if the endpoint has a
> long service interval with a short suspend time, we could potentially
> see the host skip some service intervals on resume before getting to the
> next normal TRB.
>
> With Option 2 in place, the dequeue pointer will be moved past all 10
> canceled TRBs, and no no-op TRBs will be unnecessarily generated.  So I
> think I like option 2 better.
>
> While we eventually need to implement the device suspend hook to issue
> the Stop Endpoint command with the suspend bit set, I think option 3 is
> probably too complex to consider.
>
> Are there other ways we could solve this?
>
> Sarah Sharp
>


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux