Re: xhci_hcd ERROR no room on ep ring, xhci_hcd WARN Event TRB for slot 1 ep 4 with no TDs queued?

Sander Eikelenboom <linux@xxxxxxxxxxxxxx> · Tue, 10 Aug 2010 22:21:10 +0200

Hi Sarah,

Just triend on 2.6.35 final + isoc patches and debugfs and some debug options (kmemleak spinlock, dma-api) on.
On baremetal, so without xen interfering.
But it seems to be even worse than with Xen ... The console gets completely flooded with the

Aug 10 22:01:01 localhost kernel: [  322.691778] xhci_hcd 0000:04:00.0: @206c60e0 2082c7b4 00000000 00040b4c 80001424
Aug 10 22:01:01 localhost kernel: [  322.691778] xhci_hcd 0000:04:00.0: @206c60f0 20840000 00000000 00040b4c 80001424
Aug 10 22:01:01 localhost kernel: [  322.691778] xhci_hcd 0000:04:00.0: @206c6100 20840b4c 00000000 00040b4c 80001424
Aug 10 22:01:01 localhost kernel: [  322.691778] xhci_hcd 0000:04:00.0: @206c6110 20841698 00000000 00040b4c 80001424
Aug 10 22:01:01 localhost kernel: [  322.691778] xhci_hcd 0000:04:00.0: @206c6120 208421e4 00000000 00040b4c 80001424
Aug 10 22:01:01 localhost kernel: [  322.691778] xhci_hcd 0000:04:00.0: @206c6130 20842d30 00000000 00040b4c 80001424

And i had a hard time stopping ffmpeg from grabbing.

--
Sander

Tuesday, August 10, 2010, 7:41:14 PM, you wrote:

> On Sun, Aug 08, 2010 at 12:44:19PM +0200, Sander Eikelenboom wrote:
>> Hello Andiry,
>> 
>> Attached is the syslog output with the DMA error on a kernel with patch3 applied.

> What's the base kernel you're patching?  Is it from my master branch, or
> are you taking a stock kernel and patching it with Andiry's isoc
> patches?  Which driver is loaded for your device?

>> BTW, i haven't seen a pull request for xhci and the isoc patches for the 2.6.36 merge window yet ?

> Greg has the patches in his queue (he has a quilt patchset, so he
> doesn't ask for pull requests).

> So the interesting part of your isoc endpoint ring is this:
> @c7df5660 c7a2a5d0 00000000 00040b4c 80001424
> @c7df5670 c7a2b11c 00000000 00040b4c 80001424
> @c7df5680 c7a2bc68 00000000 00040b4c 80001424
> @c7df5690 c7a2c7b4 00000000 00040b4c 80001424
> @c7df56a0 c7f103d4 00000000 00040b4c 80001425
> @c7df56b0 c7f10f20 00000000 00040b4c 80001425
> ...
> ERROR Transfer event TRB DMA ptr not part of current TD
> ep 4, skip is not set
> isoc ep
ep_ring->>deq_seg = ffff88001f7362a0
ep_ring->>dequeue = ffff88001f798120
td->>last_trb = ffff88001f798120
> event_dma = 0xc7df5680, @ffff88001e6eb4b0

> The printed dequeue pointer is a virtual pointer, so it's not very
> helpful in correlating where the endpoint dequeue pointer is.

> Andiry, maybe you can rev the third patch to use xhci_trb_virt_to_dma()?
> That way you can see why the xHCI driver claims the event isn't part of
> the current TD, when the transfer events on the ring look fairly sane (a
> successful event for c7df5660, c7df5670, and c7df5680).

> Your log shows some other interesting events on the event ring:

> Event Ring:
> @c7df3490 c7df5660 00000000 01000000 01058001
> @c7df34a0 c7df5670 00000000 01000000 01058001
> @c7df34b0 c7df5680 00000000 01000000 01058001
> @c7df34c0 00000000 00000000 15000000 00009401
> @c7df34d0 c7df5690 00000000 01000000 01058001
> @c7df34e0 00000000 00000000 15000000 00009401
> @c7df34f0 00000000 00000000 0f000000 01058001
> @c7df3500 c7df52a0 00000000 01000000 01058000

> This TRB in particular:
> @c7df34c0 00000000 00000000 15000000 00009401

> The completion code is 0x15, or 21, which is an Event Ring Full Error.
> There's a second one on the ring, which is very odd.  If the ring was
> full, how could the host manage to fit two events on the event ring?

> The xHCI driver currently doesn't handle expanding the event ring, but I
> had thought that it wouldn't get 62 events in one interrupt.  I think
> there's some deeper issue here.  Maybe my commit to move the hardware
> event ring dequeue pointer writes after all events have been handled is
> to blame?

> Sander, do you have a commit with this short description:
> "xhci: Minimize HW event ring dequeue pointer writes."?

> The last TRB on the event ring is an isochronous Ring Overrun Event:
> @c7df34f0 00000000 00000000 0f000000 01058001

> That should be harmless.  It just means the driver didn't queue an
> isochronous transfer before the host went to look at the ring.  The
> periodic schedule will be started on the next transfer when the doorbell
> is rung.  Perhaps since you're using Xen, the computer is rather slow?

> Sarah Sharp

-- 
Best regards,
 Sander                            mailto:linux@xxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html