On 03.05.2016 14:55, Felipe Balbi wrote:
Hi,
Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes:
On 03.05.2016 13:30, Felipe Balbi wrote:
When trying to access our last TRB, XHCI was
actually reading memory outside of the TRB
array/ring due to an off-by-one error.
This patch fixes that error and has the side effect
of also fixing some rare situations where long mass
storage transfers would timeout and XHCI would reset
the mass storage device before continuing.
Signed-off-by: Felipe Balbi <felipe.balbi@xxxxxxxxxxxxxxx>
---
drivers/usb/host/xhci-ring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 6c41dbbf9f2f..ba610a72c396 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -95,7 +95,7 @@ static bool last_trb_on_last_seg(struct xhci_hcd *xhci, struct xhci_ring *ring,
struct xhci_segment *seg, union xhci_trb *trb)
{
if (ring == xhci->event_ring)
- return (trb == &seg->trbs[TRBS_PER_SEGMENT]) &&
+ return (trb == &seg->trbs[TRBS_PER_SEGMENT - 1]) &&
(seg->next == xhci->event_ring->first_seg);
else
return le32_to_cpu(trb->link.control) & LINK_TOGGLE;
@@ -109,7 +109,7 @@ static int last_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
struct xhci_segment *seg, union xhci_trb *trb)
{
if (ring == xhci->event_ring)
- return trb == &seg->trbs[TRBS_PER_SEGMENT];
+ return trb == &seg->trbs[TRBS_PER_SEGMENT - 1];
else
return TRB_TYPE_LINK_LE32(trb->link.control);
}
Thanks, this needs to be fixed, but there are some changes needed to inc_enq()
as well together with this fix.
Otherwise the last TRB of a event ring won't be used
Any idea how a problem due to this could be triggered ? So far I can't
get any failures. BTW, the last TRB *will* be used by the HW anyway, no?
A truer argument would be that Linux XHCI driver might miss handling
that event. I still can't get this to fail. It seems to me that it's
pretty much impossible for us to miss an event.
If we consider what happens from the time an IRQ fires, we will have the
following sequence of events:
MSI/MSI-X/normal IRQ asserted
xhci_irq()
while(xhci_handle_event(xhci) > 0);
event = xhci->event_ring->dequeue;
/* handle the event given its type */
if (update_ptrs) inc_deq();
ring->deq_updates++;
do {
if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue))
if (ring->type == EVENT && last_trb_on_last_seg())
ring-cycle_state ^= 1;
ring->deq_seg = ring->deq_seg->next;
ring->dequeue = ring->deq_seg->trbs;
else
ring->dequeue++
} while (last_trb())
event ring:
trb[0]
...
trb[254]
trb[255] = the last trb on the ring, still used for events as event ring has no link trb.
One scenario where the dequeue pointer, both the actual hw one, and the sw one is at trb[254]
xhci hw writes an event at trb[254], event ring dequeue is at trb[254]
- interrupt, we handle the event at dequeue trb[254]
- call inc_deq(), last_trb(trb[254]) check is false so instead we increase the dequeue.
- dequeue is now at trb[255], while(last_trb(trb[255]) will now be true, so go back to "do{".
- last_trb(trb[255]) check is now true, so we jump to next segment.
so now the driver thinks the dequeue pointer is at the next segment, but hw is going to write
the next event at trb[255] which the driver is not going to notice -> drive misses one event.
With the patch above, here's (again, to make it easier to comment) what
last_trb() and last_trb_on_last_seg() look like:
static bool last_trb_on_last_seg(struct xhci_hcd *xhci, struct xhci_ring *ring,
struct xhci_segment *seg, union xhci_trb *trb)
{
if (ring == xhci->event_ring)
return (trb == &seg->trbs[TRBS_PER_SEGMENT - 1]) &&
(seg->next == xhci->event_ring->first_seg);
else
return le32_to_cpu(trb->link.control) & LINK_TOGGLE;
}
for a ring segment this evaluates to:
return trb == last_trb && seg->next == xhci->event_ring->first_seg;
IOW:
if this is the last_trb on current segment and is also the last
trb on the last segment, return true.
in fact, trb == &seg->trbs[TRBS_PER_SEGMENT - 1] is a redundant check.
static int last_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
struct xhci_segment *seg, union xhci_trb *trb)
{
if (ring == xhci->event_ring)
return trb == &seg->trbs[TRBS_PER_SEGMENT - 1];
else
return TRB_TYPE_LINK_LE32(trb->link.control);
}
likewise, for an event_ring, this turns out to be:
return trb == last_trb;
IOW, before this patch, we were potentially checking memory outside of
the TRB ring which tells me that, unless we were lucky, last_trb() and
last_trb_on_last_seg() would always evaluate to false.
Oh; and, BTW, a further cleanup would be:
Yes, looking at this there are several cleanups needed. For example inc_enq()
should actually not bother with event ring at all. HW is the producer in the
event ring, xhci driver (consumer) should just take care of the dequeue pointer.
We shouldn't mix link and last trbs helpers like we currently do, I'd prefer
functions like:
last_trb_on_seg(trb)
last_trb_on_ring(trb, ring)
is_link_trb(trb)
is_cycle_toggle_link_trb(trb)
-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html