Hi, Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes: >> Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes: >>> On 03.05.2016 13:30, Felipe Balbi wrote: >>>> When trying to access our last TRB, XHCI was >>>> actually reading memory outside of the TRB >>>> array/ring due to an off-by-one error. >>>> >>>> This patch fixes that error and has the side effect >>>> of also fixing some rare situations where long mass >>>> storage transfers would timeout and XHCI would reset >>>> the mass storage device before continuing. >>>> >>>> Signed-off-by: Felipe Balbi <felipe.balbi@xxxxxxxxxxxxxxx> >>>> --- >>>> drivers/usb/host/xhci-ring.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >>>> index 6c41dbbf9f2f..ba610a72c396 100644 >>>> --- a/drivers/usb/host/xhci-ring.c >>>> +++ b/drivers/usb/host/xhci-ring.c >>>> @@ -95,7 +95,7 @@ static bool last_trb_on_last_seg(struct xhci_hcd *xhci, struct xhci_ring *ring, >>>> struct xhci_segment *seg, union xhci_trb *trb) >>>> { >>>> if (ring == xhci->event_ring) >>>> - return (trb == &seg->trbs[TRBS_PER_SEGMENT]) && >>>> + return (trb == &seg->trbs[TRBS_PER_SEGMENT - 1]) && >>>> (seg->next == xhci->event_ring->first_seg); >>>> else >>>> return le32_to_cpu(trb->link.control) & LINK_TOGGLE; >>>> @@ -109,7 +109,7 @@ static int last_trb(struct xhci_hcd *xhci, struct xhci_ring *ring, >>>> struct xhci_segment *seg, union xhci_trb *trb) >>>> { >>>> if (ring == xhci->event_ring) >>>> - return trb == &seg->trbs[TRBS_PER_SEGMENT]; >>>> + return trb == &seg->trbs[TRBS_PER_SEGMENT - 1]; >>>> else >>>> return TRB_TYPE_LINK_LE32(trb->link.control); >>>> } >>>> >>> >>> Thanks, this needs to be fixed, but there are some changes needed to inc_enq() >>> as well together with this fix. >>> Otherwise the last TRB of a event ring won't be used >> >> Any idea how a problem due to this could be triggered ? So far I can't >> get any failures. BTW, the last TRB *will* be used by the HW anyway, no? >> A truer argument would be that Linux XHCI driver might miss handling >> that event. I still can't get this to fail. It seems to me that it's >> pretty much impossible for us to miss an event. >> >> If we consider what happens from the time an IRQ fires, we will have the >> following sequence of events: >> >> MSI/MSI-X/normal IRQ asserted >> xhci_irq() >> while(xhci_handle_event(xhci) > 0); >> event = xhci->event_ring->dequeue; >> /* handle the event given its type */ >> if (update_ptrs) inc_deq(); >> ring->deq_updates++; >> do { >> if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) >> if (ring->type == EVENT && last_trb_on_last_seg()) >> ring-cycle_state ^= 1; >> ring->deq_seg = ring->deq_seg->next; >> ring->dequeue = ring->deq_seg->trbs; >> else >> ring->dequeue++ >> } while (last_trb()) >> > > event ring: > trb[0] > ... > trb[254] > trb[255] = the last trb on the ring, still used for events as event > ring has no link trb. > > One scenario where the dequeue pointer, both the actual hw one, and > the sw one is at trb[254] > > xhci hw writes an event at trb[254], event ring dequeue is at trb[254] > - interrupt, we handle the event at dequeue trb[254] > - call inc_deq(), last_trb(trb[254]) check is false so instead we increase the dequeue. > - dequeue is now at trb[255], while(last_trb(trb[255]) will now be true, so go back to "do{". > - last_trb(trb[255]) check is now true, so we jump to next segment. > > so now the driver thinks the dequeue pointer is at the next segment, > but hw is going to write the next event at trb[255] which the driver > is not going to notice -> drive misses one event. Good point, but it seems to me the problem is that do {} while() loop on inc_deq(), how can we have more than one last TRB ? Granted, that check is different for non-EVENT rings (which checks for a link TRB instead - also wrong check as Link might not be last). Also, I haven't managed to trigger any failures from this but I have to fix my commit log as the lock-up -> timeout -> Reset on the mass storage device still happens even with $subject, albeit not as frequently. Anyway, seems like below is also needed (maybe as patch 1 on a series of two patches ?) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index ba610a72c396..6f991b7c4249 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -153,24 +153,21 @@ static void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring) !last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) ring->num_trbs_free++; - do { + ring->dequeue++; + + if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) { /* * Update the dequeue pointer further if that was a link TRB or * we're at the end of an event ring segment (which doesn't have * link TRBS) */ - if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) { - if (ring->type == TYPE_EVENT && - last_trb_on_last_seg(xhci, ring, - ring->deq_seg, ring->dequeue)) { - ring->cycle_state ^= 1; - } - ring->deq_seg = ring->deq_seg->next; - ring->dequeue = ring->deq_seg->trbs; - } else { - ring->dequeue++; - } - } while (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)); + if (ring->type == TYPE_EVENT && last_trb_on_last_seg(xhci, ring, + ring->deq_seg, ring->dequeue)) + ring->cycle_state ^= 1; + + ring->deq_seg = ring->deq_seg->next; + ring->dequeue = ring->deq_seg->trbs; + } } /* @@ -207,7 +204,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring, /* Update the dequeue pointer further if that was a link TRB or we're at * the end of an event ring segment (which doesn't have link TRBS) */ - while (last_trb(xhci, ring, ring->enq_seg, next)) { + if (last_trb(xhci, ring, ring->enq_seg, next)) { if (ring->type != TYPE_EVENT) { /* * If the caller doesn't plan on enqueueing more @@ -218,7 +215,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring, * the TD at the top of the ring. */ if (!chain && !more_trbs_coming) - break; + return; /* If we're not dealing with 0.95 hardware or * isoc rings on AMD 0.96 host, >> Oh; and, BTW, a further cleanup would be: > > Yes, looking at this there are several cleanups needed. For example inc_enq() > should actually not bother with event ring at all. HW is the producer in the > event ring, xhci driver (consumer) should just take care of the dequeue pointer. > > We shouldn't mix link and last trbs helpers like we currently do, I'd prefer > functions like: > > last_trb_on_seg(trb) > last_trb_on_ring(trb, ring) > is_link_trb(trb) > is_cycle_toggle_link_trb(trb) are you doing these changes ? If you are, do you mind prepending all of these functions with xhci_ or something like that ? It really helps grepping, ftrace, dump_stack(), etc. -- balbi
Attachment:
signature.asc
Description: PGP signature