On Fri, Dec 14, 2012 at 11:02:07AM -0500, Alan Stern wrote: > On Mon, 10 Dec 2012, Piergiorgio Sartor wrote: > > > > Here's an idea. This just occurred to me. Maybe when the driver is > > > waiting for the async schedule to turn off, new QH's should not be > > > added to the schedule. The driver could wait and add them after the > > > schedule was off. I didn't do it that way because it would slow things > > > down and add complexity, but maybe that's what the nVidia hardware > > > needs. > > > > How difficult is it? > > Would it be possible to have as a patch I could try? > > Instead of doing this, I wrote a simpler patch that removes the limit > on the number of polls. It also writes a debugging message whenever > more than 20 ms of polling is needed, so we can see what your > controller is actually doing. Hi Alan, thanks for the patch, I applied it and tested. I've bad news... Unless I made some mistake (which can always be), I did not see any of the logs of this patch in the output, despite having seen the problem occuring (this time I used "cpuburn" to overload the system). The only reasonable log was like the previous one: [ 248.833978] ehci_hcd 0000:00:0b.1: alan start cur time 4294916032 last scan 4294885464 [ 248.833991] ehci_hcd 0000:00:0b.1: command 10035 cmd reg 10075 io watchdog 1 async count 10 [ 248.833998] ehci_hcd 0000:00:0b.1: async list: [ 248.834006] ehci_hcd 0000:00:0b.1: qh ffff880149fcfe00 hw ffff8800af800000 dma af800000 next ffff880146d70380 hw_next af8009c2 [ 248.834013] ehci_hcd 0000:00:0b.1: qh ffff880146d70380 hw ffff8800af8009c0 dma af8009c0 next ffff880146c95b00 hw_next af800a82 [ 248.834020] ehci_hcd 0000:00:0b.1: qh ffff880146c95b00 hw ffff8800af800a80 dma af800a80 next ffff880149b51f00 hw_next af800cc2 [ 248.834027] ehci_hcd 0000:00:0b.1: qh ffff880149b51f00 hw ffff8800af800cc0 dma af800cc0 next ffff880145d17e80 hw_next af800f02 [ 248.834033] ehci_hcd 0000:00:0b.1: qh ffff880145d17e80 hw ffff8800af800f00 dma af800f00 next ffff88013d137180 hw_next af800c02 [ 248.834040] ehci_hcd 0000:00:0b.1: qh ffff88013d137180 hw ffff8800af800c00 dma af800c00 next ffff880148bc0000 hw_next af800d82 [ 248.834046] ehci_hcd 0000:00:0b.1: qh ffff880148bc0000 hw ffff8800af800d80 dma af800d80 next ffff8801410f9f00 hw_next af800b42 [ 248.834053] ehci_hcd 0000:00:0b.1: qh ffff8801410f9f00 hw ffff8800af800b40 dma af800b40 next ffff880147b8bc80 hw_next af8007e2 [ 248.834060] ehci_hcd 0000:00:0b.1: qh ffff880147b8bc80 hw ffff8800af8007e0 dma af8007e0 next ffff880140b49180 hw_next af800e42 [ 248.834066] ehci_hcd 0000:00:0b.1: qh ffff880140b49180 hw ffff8800af800e40 dma af800e40 next ffff880141f8ac00 hw_next af8008a2 [ 248.834073] ehci_hcd 0000:00:0b.1: qh ffff880141f8ac00 hw ffff8800af8008a0 dma af8008a0 next (null) hw_next af800002 [ 248.834080] ehci_hcd 0000:00:0b.1: This qh link time 4294885466 enqueue time 4294885466 td token 1f8c80 ov token 0 [ 248.834087] ehci_hcd 0000:00:0b.1: qh ffff880146d70380 naf800a82 info 42002210 40000000 qtd af81ccc0 [ 248.834095] ehci_hcd 0000:00:0b.1: overlay td ffff8800af8009d0 nae2531e0 00000001 t00000000 p0=af87fa9f [ 248.834103] ehci_hcd 0000:00:0b.1: dma ae2531e0 td ffff8800ae2531e0 naf81ef60 00000001 t001f8c80 p0=af87fa80 [ 248.834111] ehci_hcd 0000:00:0b.1: dummy af81ef60 td ffff8800af81ef60 n00000001 00000001 t00000040 p0=00000000 [ 248.845154] ehci_hcd 0000:00:0b.1: giveback urb ffff88014980dcc0 actual 0 [ 248.845162] ehci_hcd 0000:00:0b.1: alan end Note that there was not anymore the log line before "alan start". I suspect the previous error was something else, maybe something it happens seldom, since sometimes I saw some timeout errors, but they were almost always not a problem. Any ideas? Thanks again, bye, pg > > Alan Stern > > > > > Index: usb-3.7/drivers/usb/host/ehci-timer.c > =================================================================== > --- usb-3.7.orig/drivers/usb/host/ehci-timer.c > +++ usb-3.7/drivers/usb/host/ehci-timer.c > @@ -113,14 +113,15 @@ static void ehci_poll_ASS(struct ehci_hc > > if (want != actual) { > > - /* Poll again later, but give up after about 20 ms */ > - if (ehci->ASS_poll_count++ < 20) { > - ehci_enable_event(ehci, EHCI_HRTIMER_POLL_ASS, true); > - return; > - } > - ehci_dbg(ehci, "Waited too long for the async schedule status (%x/%x), giving up\n", > - want, actual); > + /* Poll again later */ > + ehci_enable_event(ehci, EHCI_HRTIMER_POLL_ASS, true); > + ++ehci->ASS_poll_count; > + return; > } > + > + if (ehci->ASS_poll_count > 20) > + ehci_dbg(ehci, "ASS poll count reached %d\n", > + ehci->ASS_poll_count); > ehci->ASS_poll_count = 0; > > /* The status is up-to-date; restart or stop the schedule as needed */ > @@ -159,14 +160,14 @@ static void ehci_poll_PSS(struct ehci_hc > > if (want != actual) { > > - /* Poll again later, but give up after about 20 ms */ > - if (ehci->PSS_poll_count++ < 20) { > - ehci_enable_event(ehci, EHCI_HRTIMER_POLL_PSS, true); > - return; > - } > - ehci_dbg(ehci, "Waited too long for the periodic schedule status (%x/%x), giving up\n", > - want, actual); > + /* Poll again later */ > + ehci_enable_event(ehci, EHCI_HRTIMER_POLL_PSS, true); > + return; > } > + > + if (ehci->PSS_poll_count > 20) > + ehci_dbg(ehci, "PSS poll count reached %d\n", > + ehci->PSS_poll_count); > ehci->PSS_poll_count = 0; > > /* The status is up-to-date; restart or stop the schedule as needed */ -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html