Hi, OK, this one port dead story seems like an isolated, unrelated incident. I have been suspending and resuming the machine many times, plugging the drive into both ports and the controller seems rock solid with this new patch. The fact that I was able to suspend every time just re-confirms that it keeps responding. As usual, /var/log/messages is attached. Julian W dniu 01.12.2011 02:14, Julian Sikorski pisze: > I am having a mixed answer. Here is what I did: > > I plugged the drive in > -disconnected it > - suspended/resumed > - reconnected > - used it for 90 minutes > Everything was fine, which seems better than an unpatched kernel case. I > then continued: > - suspended it with the drive connected (around 01:51:52) > - resumed, the drive still worked > Unfortunalely, the second port stopped responding (01:57:05). Another > one or two suspend-resume cycles did not bring it back to life, but the > first port was still working fine. > I am not sure if this is not a different problem, since normally after a > failure the system would not suspend at all. This time one port just > seem to be acting out. Oddly enough, nothing was ever connected to it > during this session. I will keep testing since something might > definitely be going on (it is definitely more stable, but let's hold on > with the final call). > In the meantime, please have a look at /var/log/messages, maybe there is > something interesting in it. > > Regards, > Julian > > > W dniu 30.11.2011 19:29, Sarah Sharp pisze: >> Good catch! >> >> Is there any chance that Julian's instability after system resume is >> related to this bug? If you forced a reset resume, the xHCI driver >> would have reallocated the command ring with a proper link TRB. Without >> the reset resume, the zeroed command ring wouldn't have a link TRB and >> the host controller would have eventually walked off the end of the >> command ring. That might explain why the host controller stopped >> responding to the stop endpoint command without the reset resume, but >> only after a very long time (half an hour). >> >> Julian, can you revert Andiry's patch to add the reset resume, add this >> patch instead, and see if it fixes your instability issues? If so, I >> think this is a better fix. >> >> Sarah Sharp >> >> On Wed, Nov 30, 2011 at 04:37:41PM +0800, Andiry Xu wrote: >>> When system enters suspend, xHCI driver clears command ring by writing zero >>> to all the TRBs. However, this also writes zero to the Link TRB, and the ring >>> is mangled. This may cause driver accesses wrong memory address and the >>> result is unpredicted. >>> >>> When clear the command ring, keep the last Link TRB intact, only clear its >>> cycle bit. This should fix the "command ring full" issue reported by Oliver >>> Neukum. >>> >>> This should be backported to stable kernels as old as 2.6.37, since the >>> commit 89821320 "xhci: Fix command ring replay after resume" is merged. >>> >>> Signed-off-by: Andiry Xu <andiry.xu@xxxxxxx> >>> --- >>> drivers/usb/host/xhci.c | 5 ++++- >>> 1 files changed, 4 insertions(+), 1 deletions(-) >>> >>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c >>> index aa94c01..a1afb7c 100644 >>> --- a/drivers/usb/host/xhci.c >>> +++ b/drivers/usb/host/xhci.c >>> @@ -711,7 +711,10 @@ static void xhci_clear_command_ring(struct xhci_hcd *xhci) >>> ring = xhci->cmd_ring; >>> seg = ring->deq_seg; >>> do { >>> - memset(seg->trbs, 0, SEGMENT_SIZE); >>> + memset(seg->trbs, 0, >>> + sizeof(union xhci_trb) * (TRBS_PER_SEGMENT - 1)); >>> + seg->trbs[TRBS_PER_SEGMENT - 1].link.control &= >>> + cpu_to_le32(~TRB_CYCLE); >>> seg = seg->next; >>> } while (seg != ring->deq_seg); >>> >>> -- >>> 1.7.4.1 >>> >>> >
Attachment:
messages.xz
Description: application/xz