I am having a mixed answer. Here is what I did: I plugged the drive in -disconnected it - suspended/resumed - reconnected - used it for 90 minutes Everything was fine, which seems better than an unpatched kernel case. I then continued: - suspended it with the drive connected (around 01:51:52) - resumed, the drive still worked Unfortunalely, the second port stopped responding (01:57:05). Another one or two suspend-resume cycles did not bring it back to life, but the first port was still working fine. I am not sure if this is not a different problem, since normally after a failure the system would not suspend at all. This time one port just seem to be acting out. Oddly enough, nothing was ever connected to it during this session. I will keep testing since something might definitely be going on (it is definitely more stable, but let's hold on with the final call). In the meantime, please have a look at /var/log/messages, maybe there is something interesting in it. Regards, Julian W dniu 30.11.2011 19:29, Sarah Sharp pisze: > Good catch! > > Is there any chance that Julian's instability after system resume is > related to this bug? If you forced a reset resume, the xHCI driver > would have reallocated the command ring with a proper link TRB. Without > the reset resume, the zeroed command ring wouldn't have a link TRB and > the host controller would have eventually walked off the end of the > command ring. That might explain why the host controller stopped > responding to the stop endpoint command without the reset resume, but > only after a very long time (half an hour). > > Julian, can you revert Andiry's patch to add the reset resume, add this > patch instead, and see if it fixes your instability issues? If so, I > think this is a better fix. > > Sarah Sharp > > On Wed, Nov 30, 2011 at 04:37:41PM +0800, Andiry Xu wrote: >> When system enters suspend, xHCI driver clears command ring by writing zero >> to all the TRBs. However, this also writes zero to the Link TRB, and the ring >> is mangled. This may cause driver accesses wrong memory address and the >> result is unpredicted. >> >> When clear the command ring, keep the last Link TRB intact, only clear its >> cycle bit. This should fix the "command ring full" issue reported by Oliver >> Neukum. >> >> This should be backported to stable kernels as old as 2.6.37, since the >> commit 89821320 "xhci: Fix command ring replay after resume" is merged. >> >> Signed-off-by: Andiry Xu <andiry.xu@xxxxxxx> >> --- >> drivers/usb/host/xhci.c | 5 ++++- >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c >> index aa94c01..a1afb7c 100644 >> --- a/drivers/usb/host/xhci.c >> +++ b/drivers/usb/host/xhci.c >> @@ -711,7 +711,10 @@ static void xhci_clear_command_ring(struct xhci_hcd *xhci) >> ring = xhci->cmd_ring; >> seg = ring->deq_seg; >> do { >> - memset(seg->trbs, 0, SEGMENT_SIZE); >> + memset(seg->trbs, 0, >> + sizeof(union xhci_trb) * (TRBS_PER_SEGMENT - 1)); >> + seg->trbs[TRBS_PER_SEGMENT - 1].link.control &= >> + cpu_to_le32(~TRB_CYCLE); >> seg = seg->next; >> } while (seg != ring->deq_seg); >> >> -- >> 1.7.4.1 >> >>
Attachment:
messages.xz
Description: application/xz