Re: [PATCH] xHCI: fix bug in xhci_clear_command_ring()

Julian Sikorski <belegdol@xxxxxxxxx> · Thu, 01 Dec 2011 02:14:35 +0100

I am having a mixed answer. Here is what I did:

I plugged the drive in
-disconnected it
- suspended/resumed
- reconnected
- used it for 90 minutes
Everything was fine, which seems better than an unpatched kernel case. I
then continued:
- suspended it with the drive connected (around 01:51:52)
- resumed, the drive still worked
Unfortunalely, the second port stopped responding (01:57:05). Another
one or two suspend-resume cycles did not bring it back to life, but the
first port was still working fine.
I am not sure if this is not a different problem, since normally after a
failure the system would not suspend at all. This time one port just
seem to be acting out. Oddly enough, nothing was ever connected to it
during this session. I will keep testing since something might
definitely be going on (it is definitely more stable, but let's hold on
with the final call).
In the meantime, please have a look at /var/log/messages, maybe there is
something interesting in it.

Regards,
Julian

W dniu 30.11.2011 19:29, Sarah Sharp pisze:
> Good catch!
> 
> Is there any chance that Julian's instability after system resume is
> related to this bug?  If you forced a reset resume, the xHCI driver
> would have reallocated the command ring with a proper link TRB.  Without
> the reset resume, the zeroed command ring wouldn't have a link TRB and
> the host controller would have eventually walked off the end of the
> command ring.  That might explain why the host controller stopped
> responding to the stop endpoint command without the reset resume, but
> only after a very long time (half an hour).
> 
> Julian, can you revert Andiry's patch to add the reset resume, add this
> patch instead, and see if it fixes your instability issues?  If so, I
> think this is a better fix.
> 
> Sarah Sharp
> 
> On Wed, Nov 30, 2011 at 04:37:41PM +0800, Andiry Xu wrote:
>> When system enters suspend, xHCI driver clears command ring by writing zero
>> to all the TRBs. However, this also writes zero to the Link TRB, and the ring
>> is mangled. This may cause driver accesses wrong memory address and the
>> result is unpredicted.
>>
>> When clear the command ring, keep the last Link TRB intact, only clear its
>> cycle bit. This should fix the "command ring full" issue reported by Oliver
>> Neukum.
>>
>> This should be backported to stable kernels as old as 2.6.37, since the
>> commit 89821320 "xhci: Fix command ring replay after resume" is merged.
>>
>> Signed-off-by: Andiry Xu <andiry.xu@xxxxxxx>
>> ---
>>  drivers/usb/host/xhci.c |    5 ++++-
>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
>> index aa94c01..a1afb7c 100644
>> --- a/drivers/usb/host/xhci.c
>> +++ b/drivers/usb/host/xhci.c
>> @@ -711,7 +711,10 @@ static void xhci_clear_command_ring(struct xhci_hcd *xhci)
>>  	ring = xhci->cmd_ring;
>>  	seg = ring->deq_seg;
>>  	do {
>> -		memset(seg->trbs, 0, SEGMENT_SIZE);
>> +		memset(seg->trbs, 0,
>> +			sizeof(union xhci_trb) * (TRBS_PER_SEGMENT - 1));
>> +		seg->trbs[TRBS_PER_SEGMENT - 1].link.control &=
>> +			cpu_to_le32(~TRB_CYCLE);
>>  		seg = seg->next;
>>  	} while (seg != ring->deq_seg);
>>  
>> -- 
>> 1.7.4.1
>>
>>

Attachment:
messages.xz

Description: application/xz