Re: [PATCH] usb: xhci: ring: fix off-by-one error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes:
>> Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes:
>>> On 03.05.2016 13:30, Felipe Balbi wrote:
>>>> When trying to access our last TRB, XHCI was
>>>> actually reading memory outside of the TRB
>>>> array/ring due to an off-by-one error.
>>>>
>>>> This patch fixes that error and has the side effect
>>>> of also fixing some rare situations where long mass
>>>> storage transfers would timeout and XHCI would reset
>>>> the mass storage device before continuing.
>>>>
>>>> Signed-off-by: Felipe Balbi <felipe.balbi@xxxxxxxxxxxxxxx>
>>>> ---
>>>>    drivers/usb/host/xhci-ring.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>>> index 6c41dbbf9f2f..ba610a72c396 100644
>>>> --- a/drivers/usb/host/xhci-ring.c
>>>> +++ b/drivers/usb/host/xhci-ring.c
>>>> @@ -95,7 +95,7 @@ static bool last_trb_on_last_seg(struct xhci_hcd *xhci, struct xhci_ring *ring,
>>>>    		struct xhci_segment *seg, union xhci_trb *trb)
>>>>    {
>>>>    	if (ring == xhci->event_ring)
>>>> -		return (trb == &seg->trbs[TRBS_PER_SEGMENT]) &&
>>>> +		return (trb == &seg->trbs[TRBS_PER_SEGMENT - 1]) &&
>>>>    			(seg->next == xhci->event_ring->first_seg);
>>>>    	else
>>>>    		return le32_to_cpu(trb->link.control) & LINK_TOGGLE;
>>>> @@ -109,7 +109,7 @@ static int last_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
>>>>    		struct xhci_segment *seg, union xhci_trb *trb)
>>>>    {
>>>>    	if (ring == xhci->event_ring)
>>>> -		return trb == &seg->trbs[TRBS_PER_SEGMENT];
>>>> +		return trb == &seg->trbs[TRBS_PER_SEGMENT - 1];
>>>>    	else
>>>>    		return TRB_TYPE_LINK_LE32(trb->link.control);
>>>>    }
>>>>
>>>
>>> Thanks, this needs to be fixed, but there are some changes needed to inc_enq()
>>> as well together with this fix.
>>> Otherwise the last TRB of a event ring won't be used
>>
>> Any idea how a problem due to this could be triggered ? So far I can't
>> get any failures. BTW, the last TRB *will* be used by the HW anyway, no?
>> A truer argument would be that Linux XHCI driver might miss handling
>> that event. I still can't get this to fail. It seems to me that it's
>> pretty much impossible for us to miss an event.
>>
>> If we consider what happens from the time an IRQ fires, we will have the
>> following sequence of events:
>>
>> MSI/MSI-X/normal IRQ asserted
>>   xhci_irq()
>>    while(xhci_handle_event(xhci) > 0);
>>     event = xhci->event_ring->dequeue;
>>      /* handle the event given its type */
>>       if (update_ptrs) inc_deq();
>>        ring->deq_updates++;
>>        do {
>>         if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue))
>>          if (ring->type == EVENT && last_trb_on_last_seg())
>>            ring-cycle_state ^= 1;
>>           ring->deq_seg = ring->deq_seg->next;
>>           ring->dequeue = ring->deq_seg->trbs;
>>          else
>>           ring->dequeue++
>>        } while (last_trb())
>>
>
> event ring:
> trb[0]
> ...
> trb[254]
> trb[255] = the last trb on the ring, still used for events as event
> ring has no link trb.
>
> One scenario where the dequeue pointer, both the actual hw one, and
> the sw one is at trb[254]
>
> xhci hw writes an event at trb[254], event ring dequeue is at trb[254]
>   - interrupt, we handle the event at dequeue trb[254]
>   - call inc_deq(), last_trb(trb[254]) check is false so instead we increase the dequeue.
>   - dequeue is now at trb[255], while(last_trb(trb[255]) will now be true, so go back to "do{".
>   - last_trb(trb[255]) check is now true, so we jump to next segment.
>
> so now the driver thinks the dequeue pointer is at the next segment,
> but hw is going to write the next event at trb[255] which the driver
> is not going to notice -> drive misses one event.

Good point, but it seems to me the problem is that do {} while() loop on
inc_deq(), how can we have more than one last TRB ? Granted, that check
is different for non-EVENT rings (which checks for a link TRB instead -
also wrong check as Link might not be last).

Also, I haven't managed to trigger any failures from this but I have to
fix my commit log as the lock-up -> timeout -> Reset on the mass storage
device still happens even with $subject, albeit not as frequently.

Anyway, seems like below is also needed (maybe as patch 1 on a series of
two patches ?)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index ba610a72c396..6f991b7c4249 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -153,24 +153,21 @@ static void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring)
                        !last_trb(xhci, ring, ring->deq_seg, ring->dequeue))
                ring->num_trbs_free++;
 
-       do {
+       ring->dequeue++;
+
+       if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) {
                /*
                 * Update the dequeue pointer further if that was a link TRB or
                 * we're at the end of an event ring segment (which doesn't have
                 * link TRBS)
                 */
-               if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) {
-                       if (ring->type == TYPE_EVENT &&
-                                       last_trb_on_last_seg(xhci, ring,
-                                               ring->deq_seg, ring->dequeue)) {
-                               ring->cycle_state ^= 1;
-                       }
-                       ring->deq_seg = ring->deq_seg->next;
-                       ring->dequeue = ring->deq_seg->trbs;
-               } else {
-                       ring->dequeue++;
-               }
-       } while (last_trb(xhci, ring, ring->deq_seg, ring->dequeue));
+               if (ring->type == TYPE_EVENT && last_trb_on_last_seg(xhci, ring,
+                                               ring->deq_seg, ring->dequeue))
+                       ring->cycle_state ^= 1;
+
+               ring->deq_seg = ring->deq_seg->next;
+               ring->dequeue = ring->deq_seg->trbs;
+       }
 }
 
 /*
@@ -207,7 +204,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
        /* Update the dequeue pointer further if that was a link TRB or we're at
         * the end of an event ring segment (which doesn't have link TRBS)
         */
-       while (last_trb(xhci, ring, ring->enq_seg, next)) {
+       if (last_trb(xhci, ring, ring->enq_seg, next)) {
                if (ring->type != TYPE_EVENT) {
                        /*
                         * If the caller doesn't plan on enqueueing more
@@ -218,7 +215,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
                         * the TD at the top of the ring.
                         */
                        if (!chain && !more_trbs_coming)
-                               break;
+                               return;
 
                        /* If we're not dealing with 0.95 hardware or
                         * isoc rings on AMD 0.96 host,

>> Oh; and, BTW, a further cleanup would be:
>
> Yes, looking at this there are several cleanups needed. For example inc_enq()
> should actually not bother with event ring at all. HW is the producer in the
> event ring, xhci driver (consumer) should just take care of the dequeue pointer.
>
> We shouldn't mix link and last trbs helpers like we currently do, I'd prefer
> functions like:
>
> last_trb_on_seg(trb)
> last_trb_on_ring(trb, ring)
> is_link_trb(trb)
> is_cycle_toggle_link_trb(trb)

are you doing these changes ? If you are, do you mind prepending all of
these functions with xhci_ or something like that ? It really helps
grepping, ftrace, dump_stack(), etc.

-- 
balbi

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux