Re: [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/15/21 7:50 PM, David Laight wrote:
> From: Sergei Shtylyov
>> Sent: 15 January 2021 16:40
>>
>> On 1/15/21 7:19 PM, Mathias Nyman wrote:
>>
>>> Once the command ring doorbell is rung the xHC controller will parse all
>>> command TRBs on the command ring that have the cycle bit set properly.
>>>
>>> If the driver just started writing the next command TRB to the ring when
>>> hardware finished the previous TRB, then HW might fetch an incomplete TRB
>>> as long as its cycle bit set correctly.
>>>
>>> A command TRB is 16 bytes (128 bits) long.
>>> Driver writes the command TRB in four 32 bit chunks, with the chunk
>>> containing the cycle bit last. This does however not guarantee that
>>> chunks actually get written in that order.
>>>
>>> This was detected in stress testing when canceling URBs with several
>>> connected USB devices.
>>> Two consecutive "Set TR Dequeue pointer" commands got queued right
>>> after each other, and the second one was only partially written when
>>> the controller parsed it, causing the dequeue pointer to be set
>>> to bogus values. This was seen as error messages:
>>>
>>> "Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
>>>
>>> Solution is to add a write memory barrier before writing the cycle bit.
>>>
>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>> Tested-by: Ross Zwisler <zwisler@xxxxxxxxxx>
>>> Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
>>> ---
>>>  drivers/usb/host/xhci-ring.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>> index 5677b81c0915..cf0c93a90200 100644
>>> --- a/drivers/usb/host/xhci-ring.c
>>> +++ b/drivers/usb/host/xhci-ring.c
>>> @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
>>>  	trb->field[0] = cpu_to_le32(field1);
>>>  	trb->field[1] = cpu_to_le32(field2);
>>>  	trb->field[2] = cpu_to_le32(field3);
>>> +	/* make sure TRB is fully written before giving it to the controller */
>>> +	wmb();
>>
>>    Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases...
> 
> Isn't dma_wmb() needed between the last memory write and the io_write to the doorbell?

   No.

> Here we need to ensure the two memory writes aren't re-ordered.

   No, we need all 3 ring memory writes to be ordered such that they all happen before the 4th
write. It's not wonder this bug hasn't been noticed before -- x86 has strong write ordering
unlike ARM/etc.

> Apart from alpha isn't a barrier() likely to be enough for that.

   Not sure -- we don't have any barriers before the equivalents of a doorbell write
in e.g. the Renesas Ehter driver.

> It is worth checking that the failing compiles didn't have the writes reordered.

  The writes are reordered not because of the compiler -- the read/write reordering is a
CPU feature (on at least non-x86). :-)

> 	David
> 
>>
>>>  	trb->field[3] = cpu_to_le32(field4);
>>>
>>>  	trace_xhci_queue_trb(ring, trb);

MBR, Sergei



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux