From: Sergei Shtylyov > Sent: 15 January 2021 16:40 > > On 1/15/21 7:19 PM, Mathias Nyman wrote: > > > Once the command ring doorbell is rung the xHC controller will parse all > > command TRBs on the command ring that have the cycle bit set properly. > > > > If the driver just started writing the next command TRB to the ring when > > hardware finished the previous TRB, then HW might fetch an incomplete TRB > > as long as its cycle bit set correctly. > > > > A command TRB is 16 bytes (128 bits) long. > > Driver writes the command TRB in four 32 bit chunks, with the chunk > > containing the cycle bit last. This does however not guarantee that > > chunks actually get written in that order. > > > > This was detected in stress testing when canceling URBs with several > > connected USB devices. > > Two consecutive "Set TR Dequeue pointer" commands got queued right > > after each other, and the second one was only partially written when > > the controller parsed it, causing the dequeue pointer to be set > > to bogus values. This was seen as error messages: > > > > "Mismatch between completed Set TR Deq Ptr command & xHCI internal state" > > > > Solution is to add a write memory barrier before writing the cycle bit. > > > > Cc: <stable@xxxxxxxxxxxxxxx> > > Tested-by: Ross Zwisler <zwisler@xxxxxxxxxx> > > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> > > --- > > drivers/usb/host/xhci-ring.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > > index 5677b81c0915..cf0c93a90200 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring, > > trb->field[0] = cpu_to_le32(field1); > > trb->field[1] = cpu_to_le32(field2); > > trb->field[2] = cpu_to_le32(field3); > > + /* make sure TRB is fully written before giving it to the controller */ > > + wmb(); > > Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases... Isn't dma_wmb() needed between the last memory write and the io_write to the doorbell? Here we need to ensure the two memory writes aren't re-ordered. Apart from alpha isn't a barrier() likely to be enough for that. It is worth checking that the failing compiles didn't have the writes reordered. David > > > trb->field[3] = cpu_to_le32(field4); > > > > trace_xhci_queue_trb(ring, trb); > > MBR, Sergei - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)