On Thu, Jul 26, 2012 at 07:16:38AM -0700, Sarah Sharp wrote: > On Wed, Jul 25, 2012 at 10:24:26PM -0700, Matthew Hall wrote: > > On Wed, Jul 25, 2012 at 10:46:37AM -0700, Sarah Sharp wrote: > > I think I'm in the same state as Gary only different. ;) I am getting weird > > behavior with the quirk patch. The device popped up for about half a minute, > > then went haywire and disappeared after the exfat FUSE FS tried to mount it > > and begin reading the volume successfully for a short moment. > > I think now that your host is working, you're running into a separate > bug that several other people have discovered. Your mass storage device > stalls a lot of SCSI commands (which is fine), and that eventually hits > a bug when the stall happens just before the end of a transfer ring > segment boundary. > > The bug causes this message to appear: > ERROR Transfer event TRB DMA ptr not part of current TD Ok, I think I have the root cause of this message, and it's a nasty little bug. Can you apply the attached patch (instead of the previous debug patch) and test? I think your disk will work once it's applied, but please double check that you don't see any more of that ERROR message. Sarah Sharp
>From 568468d68a3719c0e52e4e85b4fa5ec4324bff70 Mon Sep 17 00:00:00 2001 From: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> Date: Thu, 26 Jul 2012 12:03:59 -0700 Subject: [PATCH] xhci: Fix bug after deq ptr set to link TRB. This patch fixes a particularly nasty bug that was revealed by the ring expansion patches. The bug has been present since the very beginning of the xHCI driver history, and could have caused general protection faults from bad memory accesses. The first thing to note is that a Set TR Dequeue Pointer command can move the dequeue pointer to a link TRB, if the canceled or stalled transfer TD ended just before a link TRB. The function to increment the dequeue pointer, inc_deq, was written before cancellation and stall support was added. It assumed that the dequeue pointer could never point to a link TRB. It would unconditionally increment the dequeue pointer at the start of the function, check if the pointer was now on a link TRB, and move it to the top of the next segment if so. This means that if a Set TR Dequeue Point command moved the dequeue pointer to a link TRB, a subsequent call to inc_deq() would move the pointer off the segment and into la-la-land. It would then read from that memory to determine if it was a link TRB. Other functions would often call inc_deq() until the dequeue pointer matched some other pointer, which means this function would quite happily read all of system memory before wrapping around to the right pointer value. The only reason the original code worked at all is because there was only one ring segment. With the ring expansion patches, the dequeue pointer would eventually wrap into place, but the dequeue segment would be out-of-sync. On the second TD after the dequeue pointer was moved to a link TRB, trb_in_td() would fail (because the dequeue pointer and dequeue segment were out-of-sync), and this message would appear: ERROR Transfer event TRB DMA ptr not part of current TD This should fix bugzilla entry 4333 (option-based modem unhappy on USB 3.0 port: "Transfer event TRB DMA ptr not part of current TD", "rejecting I/O to offline device"), https://bugzilla.kernel.org/show_bug.cgi?id=43333 and possibly other general protection fault bugs as well. This patch should be backported to kernels as old as 2.6.31. A separate patch will be created for kernels older than 3.4, since inc_deq was modified in 3.4 and this patch will not apply. Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx --- drivers/usb/host/xhci-ring.c | 36 ++++++++++++++++++++++-------------- 1 files changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 8275645..6ec3633 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -145,29 +145,37 @@ static void next_trb(struct xhci_hcd *xhci, */ static void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring) { - union xhci_trb *next; unsigned long long addr; ring->deq_updates++; - /* If this is not event ring, there is one more usable TRB */ + /* + * If this is not event ring, and the dequeue pointer + * is not on a link TRB, there is one more usable TRB + */ if (ring->type != TYPE_EVENT && !last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) ring->num_trbs_free++; - next = ++(ring->dequeue); - /* Update the dequeue pointer further if that was a link TRB or we're at - * the end of an event ring segment (which doesn't have link TRBS) - */ - while (last_trb(xhci, ring, ring->deq_seg, next)) { - if (ring->type == TYPE_EVENT && last_trb_on_last_seg(xhci, - ring, ring->deq_seg, next)) { - ring->cycle_state = (ring->cycle_state ? 0 : 1); + do { + /* + * Update the dequeue pointer further if that was a link TRB or + * we're at the end of an event ring segment (which doesn't have + * link TRBS) + */ + if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) { + if (ring->type == TYPE_EVENT && + last_trb_on_last_seg(xhci, ring, + ring->deq_seg, ring->dequeue)) { + ring->cycle_state = (ring->cycle_state ? 0 : 1); + } + ring->deq_seg = ring->deq_seg->next; + ring->dequeue = ring->deq_seg->trbs; + } else { + ring->dequeue++; } - ring->deq_seg = ring->deq_seg->next; - ring->dequeue = ring->deq_seg->trbs; - next = ring->dequeue; - } + } while (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)); + addr = (unsigned long long) xhci_trb_virt_to_dma(ring->deq_seg, ring->dequeue); } -- 1.7.9