Re: ehci with full speed audio device leaking sitds

Andy Leiserson <andy@xxxxxxxxxxxxx> · Mon, 28 Jan 2013 22:33:41 -0800

On Sun, Jan 27, 2013 at 08:07:01PM -0500, Alan Stern wrote:
> This is a place where 3.8 has changed from 3.6.  Does your problem go 
> away if you revert the first hunk of your patch and simply set
> 
> 	ehci->next_frame = (now_frame - 1) & fmask;
> 
> here?

Yes, this by itself does resolve the problem.

I also tried applying the changes since 3.6.11 that touch ehci-sched.c,
and then changing the logic for last_iso_frame to do as above and start
the next scan one frame earlier, and that's also working.

Thanks.

On Sun, Jan 27, 2013 at 08:07:01PM -0500, Alan Stern wrote:
> On Sun, 27 Jan 2013, Andy Leiserson wrote:
> 
> > Please CC me on replies, as I'm not on the list.
> > 
> > My system is an AR9132 mips SoC, kernel 3.6.11 with OpenWrt patches.
> > The audio device is a C-Media usb audio adapter, IDs 0d8c/000e. It is a
> > full speed device, and is connected directly to the only USB port. I'm
> > using only the output, but it does have a mic input as well. (There are
> > also various controls, e.g. mute, suggested by the USB descriptors, but
> > those aren't present on my device.)
> > 
> > The device was working fine with OpenWrt-patched kernel 3.3.8.
> > 
> > With the new kernel, I have the following problems:
> >  * audio output is choppy (underruns). the usb audio driver
> >    intermittently prints:
> > 
> >    ALSA sound/usb/pcm.c:1187 delay: estimated 624, actual 288
> > 
> >    (I realize the message is not directly indicative of an underrun, but
> >    I believe the underruns and the message are correlated)
> > 
> >    This tends to get worse as playback progresses, and sometimes there
> >    are "cannot submit urb" messages with error -27, and sometimes
> >    playback hangs.
> > 
> >  * audio programs almost always hang in "D" state when closing the audio
> >    device.
> > 
> > I investigated the second problem first in hopes I could eliminate the
> > reboot after each experiment, but in the end, both problems went away
> > simultaneously.
> 
> You should be aware that there have been changes to the periodic 
> scanning code in ehci-hcd since the kernel you've been using.  It might 
> matter (although I suspect it doesn't).
> 
> > I tracked the hang to the sleep in ehci_endpoint_disable that waits for
> > active iso transactions to finish (indicated by stream->td_list being
> > empty).
> > 
> > My theory is that the changes in
> > f42890782241a60d107f23d08089a4a12b507a11 ("USB: EHCI: simplify
> > isochronous scanning") result in leakage of sitds when still-active
> > sitds in the previous frame are skipped by scan_isoc. I don't know why
> > this results in the underruns, but it is a low-memory embedded system,
> > so possibly it's just leaking enough descriptors to run out of
> > dma-eligible memory.
> 
> It looks like you are right.  Those sitds are supposed to get 
> deallocated the next time scan_isoc runs, but they probably aren't.
> 
> > The patch below restores the behavior before commit f428 of stopping the
> > scan immediately once a still-active descriptor is found (and leaving
> > next_frame at the previous frame rather than the current frame if
> > appropriate). This patch resolves the problems I've been seeing, but I
> > don't know if it's the best fix.
> 
> Stopping the scan is the wrong thing to do, but I think you may be 
> right about the next_frame value.
> 
> > I have various ftrace captures, which I can post somewhere if there's
> > interest. Some of them have the iso stream refcounting restored, and
> > clearly show that there aren't as many puts as gets. The detail of
> > scan_isoc behavior isn't visible in the traces, though.
> 
> The gets and puts shouldn't matter.  As long as the sitd structures get 
> deallocated when they aren't active any more, everything should work 
> properly.
> 
> > (An aside: I noticed there is an rmb() before the itd status check, but
> > not before the sitd status check.)
> 
> That's because the itd code loops over the members of 
> q.itd->hw_transaction whereas the sitd code does nothing of the sort.
> 
> > --- a/drivers/usb/host/ehci-sched.c	2013-01-26 19:37:21.000000000 -0800
> > +++ b/drivers/usb/host/ehci-sched.c	2013-01-26 19:39:25.000000000 -0800
> > @@ -2296,7 +2296,7 @@
> >  					type = Q_NEXT_TYPE(ehci,
> >  							q.sitd->hw_next);
> >  					q = *q_p;
> > -					break;
> > +					goto end_scan;
> 
> No, we definitely shouldn't be doing this.
> 
> >  				}
> >  
> >  				/* Take finished SITDs out of the schedule
> > @@ -2336,5 +2336,6 @@
> >  			break;
> >  		frame = (frame + 1) & fmask;
> >  	}
> > -	ehci->next_frame = now_frame;
> > +end_scan:
> > +	ehci->next_frame = frame;
> >  }
> 
> This is a place where 3.8 has changed from 3.6.  Does your problem go 
> away if you revert the first hunk of your patch and simply set
> 
> 	ehci->next_frame = (now_frame - 1) & fmask;
> 
> here?
> 
> Alan Stern
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html