On Fri, Apr 23, 2010 at 06:36:04PM +0800, Andiry Xu wrote: > On Thu, 2010-04-22 at 12:43 -0700, Sarah Sharp wrote: > > On Tue, Apr 20, 2010 at 05:54:41PM +0800, Libin Yang wrote: > > > This is the version 4 for ISOC patches. > > > > > > The difference from v3 is to add a transfer ring check in > > > xhci_queue_isoc_tx_prepare() before inserting any tds of the urb to the ring. > > > > I tested this on my FS webcam that was causing a scheduling while atomic > > oops on the last version of the patch. With the current 3 isochronous > > patches (but without your patch to allocate bigger ring for isochronous > > endpoints), the driver runs out of room on the endpoint ring (expected) > > and does not oops the system. I see the isochronous transfers > > successfully canceled, so you've fixed that bug. :) > > > > With your patch to enlarge the isoc transfer ring, I successfully get > > video out of the FS webcam! > > > > Unfortunately, I tried it with my HS webcam, and that does not work. > > The log file gets garbled with the amount of messages coming from the > > driver, but I see a lot of "ERROR Transfer event TRB DMA ptr not part of > > current TD" messages. It seems like the driver and the host controller > > got out of sync somewhere, and the host was giving back transfers for > > isoc TDs, and the driver was skipping over them. But the log file where > > the transfers were enqueued was garbled, so I can't see exactly what's > > happening. > > > > I tried applying a patch to turn off a lot of the debugging, and then > > the webcam worked, so I'm not sure if it's an intermittent failure. I > > need to test more. In the mean time, the log file I have from the high > > speed webcam failure is at > > > > http://minilop.net/~sarah/isoc-hs-webcam-v4-patches.tar.gz > > > > Maybe you can see the problem quicker than I? > > > > Hmm... I'm curious about this. Can you reproduce this issue every time > with debug message on? > > Can you add some debug message to print out the following information > when the error happens? > The address of ep_ring->dequeue, td->last_trb, event and event_dma, and > call xhci_debug_ring(xhci, xhci->event_ring) and xhci_debug_ring(xhci, > ep_ring). I caught the bug again, but unfortunately, I had xHCI debugging turned off at the time and I only got the warning/error level messages. I turned off the len=0 bug fix I had added to get your compiler warning to go away, so I'm not sure if the error is related to the bug you need to fix in your next patchset. Here's a snippet of the output I captured: Apr 27 13:42:07 xanatos kernel: [12041.969986] usb 1-3: Installing alternate setting 11 Apr 27 13:42:07 xanatos kernel: [12041.970221] usb 1-3: ep 0x81 - rounding interval to 1 microframes Apr 27 13:42:09 xanatos kernel: [12044.124762] usb 1-3: Installing alternate setting 0 Apr 27 13:42:09 xanatos kernel: [12044.124863] xhci_hcd 0000:05:00.0: WARN Event TRB for slot 1 ep 2 with no TDs queued? Apr 27 13:42:09 xanatos kernel: [12044.124868] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32 Apr 27 13:42:09 xanatos kernel: [12044.125804] usb 1-3: Installing alternate setting 0 Apr 27 13:42:11 xanatos kernel: [12046.455090] usb 1-3: Installing alternate setting 5 Apr 27 13:42:11 xanatos kernel: [12046.455340] usb 1-3: ep 0x81 - rounding interval to 1 microframes Apr 27 13:42:16 xanatos kernel: [12051.595984] xhci_queue_intr_tx: 6 callbacks suppressed Apr 27 13:42:37 xanatos kernel: [12072.581713] usb 1-3: Installing alternate setting 0 Apr 27 13:42:37 xanatos kernel: [12072.667674] usb 1-3: Installing alternate setting 0 Apr 27 13:42:45 xanatos kernel: [12080.229229] usb 1-3: Installing alternate setting 11 Apr 27 13:42:45 xanatos kernel: [12080.229479] usb 1-3: ep 0x81 - rounding interval to 1 microframes Apr 27 13:42:45 xanatos kernel: [12080.477101] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Apr 27 13:42:45 xanatos kernel: [12080.477114] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32 Apr 27 13:42:45 xanatos kernel: [12080.477121] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870 Apr 27 13:42:45 xanatos kernel: [12080.477127] trb_in_td - suspect DMA = 0x0 Apr 27 13:42:45 xanatos kernel: [12080.477134] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0 Apr 27 13:42:45 xanatos kernel: [12080.477269] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Apr 27 13:42:45 xanatos kernel: [12080.477279] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32 Apr 27 13:42:45 xanatos kernel: [12080.477285] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870 Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890 Apr 27 13:42:45 xanatos kernel: [12080.477298] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0 Apr 27 13:42:45 xanatos kernel: [12080.477269] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Apr 27 13:42:45 xanatos kernel: [12080.477279] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32 Apr 27 13:42:45 xanatos kernel: [12080.477285] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870 Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890 Apr 27 13:42:45 xanatos kernel: [12080.477298] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0 Apr 27 13:42:45 xanatos kernel: [12080.477392] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Apr 27 13:42:45 xanatos kernel: [12080.477399] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32 Apr 27 13:42:45 xanatos kernel: [12080.477405] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870 Apr 27 13:42:45 xanatos kernel: [12080.477411] trb_in_td - suspect DMA = 0x3cd9d8a0 Apr 27 13:42:45 xanatos kernel: [12080.477418] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0 Apr 27 13:42:45 xanatos kernel: [12080.477518] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Apr 27 13:42:45 xanatos kernel: [12080.477525] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32 Apr 27 13:42:45 xanatos kernel: [12080.477531] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870 Apr 27 13:42:45 xanatos kernel: [12080.477537] trb_in_td - suspect DMA = 0x3cd9d8b0 Apr 27 13:42:45 xanatos kernel: [12080.477543] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0 The full log is here: http://minilop.net/~sarah/logs/isoc-and-pm-2010-04-27-13-38.txt What the log shows is that the hardware gave back a transfer completion with a DMA pointer of all zeros (the suspect DMA = 0x0 line). I can't tell what the completion code was, only that the endpoint ring that was indexed in the event had at least one TD on the ring. After the transfer event with the zeroed DMA pointer, the host attempted to give back several TDs. It looks like those events are for the same isoc ring, after the TD at the head of the endpoint list. The ring handler doesn't complete the queued TD when it receives an event that looks erroneous, so the endpoint gets stuck at one TD. Now, I see there's this note in the xHCI spec: "Note: The TRB Pointer field of Missed Service Error Transfer Event may be ‘0’." Section 4.10.3.2 talks about what happens when the host can't transfer the data within the requested service interval, and it says that it may give a transfer event completion code of "Missed Service Error" with a DMA pointer of zero. I wonder if that's what I'm seeing? The DMA pointers in the transfer completion events do look a bit weird even with that theory: Apr 27 13:42:45 xanatos kernel: [12080.477127] trb_in_td - suspect DMA = 0x0 Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890 Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890 Apr 27 13:42:45 xanatos kernel: [12080.477411] trb_in_td - suspect DMA = 0x3cd9d8a0 Apr 27 13:42:45 xanatos kernel: [12080.477537] trb_in_td - suspect DMA = 0x3cd9d8b0 Why would it giveback 0x3cd9d890 twice? I'm also wondering about this line: Apr 27 13:42:09 xanatos kernel: [12044.124863] xhci_hcd 0000:05:00.0: WARN Event TRB for slot 1 ep 2 with no TDs queued? If this is related to the missed service error, then you need to add checking for that completion code before attempting to find the endpoint segment that the event came from. So there would need to be some completion code checking before this chunk of code: /* Is this a TRB in the currently executing TD? */ event_seg = trb_in_td(ep_ring->deq_seg, ep_ring->dequeue, td->last_trb, event_dma, false); if (!event_seg) { /* HC is busted, give up! */ xhci_err(xhci, "ERROR Transfer event TRB DMA ptr not part of current TD\n"); The xHCI driver should just giveback the first TD in the list when it sees that completion code, since the host is going to advance to the next TD. I'm not sure what completion code you should give the URB. Maybe -EREMOTEIO for a short transfer and set the transferred_length to zero? I'll try to reproduce this with full debugging turned on, but if it is a missed service error, then it's probably subject to hardware/bus race conditions and may not be easy to trigger. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html