On 1/15/21 7:19 PM, Mathias Nyman wrote: > Once the command ring doorbell is rung the xHC controller will parse all > command TRBs on the command ring that have the cycle bit set properly. > > If the driver just started writing the next command TRB to the ring when > hardware finished the previous TRB, then HW might fetch an incomplete TRB > as long as its cycle bit set correctly. > > A command TRB is 16 bytes (128 bits) long. > Driver writes the command TRB in four 32 bit chunks, with the chunk > containing the cycle bit last. This does however not guarantee that > chunks actually get written in that order. > > This was detected in stress testing when canceling URBs with several > connected USB devices. > Two consecutive "Set TR Dequeue pointer" commands got queued right > after each other, and the second one was only partially written when > the controller parsed it, causing the dequeue pointer to be set > to bogus values. This was seen as error messages: > > "Mismatch between completed Set TR Deq Ptr command & xHCI internal state" > > Solution is to add a write memory barrier before writing the cycle bit. > > Cc: <stable@xxxxxxxxxxxxxxx> > Tested-by: Ross Zwisler <zwisler@xxxxxxxxxx> > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> > --- > drivers/usb/host/xhci-ring.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 5677b81c0915..cf0c93a90200 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring, > trb->field[0] = cpu_to_le32(field1); > trb->field[1] = cpu_to_le32(field2); > trb->field[2] = cpu_to_le32(field3); > + /* make sure TRB is fully written before giving it to the controller */ > + wmb(); Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases... > trb->field[3] = cpu_to_le32(field4); > > trace_xhci_queue_trb(ring, trb); MBR, Sergei