Re: Query about xrun on usb/pcm

Takashi Iwai <tiwai@xxxxxxx> · Tue, 29 Nov 2022 08:45:14 +0100

On Mon, 28 Nov 2022 23:51:55 +0100,
Carl Hetherington wrote:
> 
> Hi Takashi,
> 
> Thank you for your continued attention with this!
> 
> [snip]
> 
> > > I'll see if can apply a similar fix to this case, though to my naive
> > > eyes it looks a little trickier as the xrun is found in the snd_pcm
> > > code rather than the USB code.  Any suggestions most welcome!
> >
> > OK, then it's a bit different problem, and not so trivial to fix in
> > the kernel side alone, I'm afraid.  Basically it's a race between
> > start and stop of two streams.  The key point is that, for stopping a
> > (USB) stream, a sync-stop operation is needed, and this can't be
> > performed at the PCM trigger itself (which is an atomic operation).
> > So, the kernel trigger may at most return an error there.
> >
> > I assume that it's from snd_usb_endpoint_start() and it returning
> > -EPIPE error.  If so, we may change the PCM core code to set the PCM
> > state again XRUN in such an error case, so that application may repeat
> > the standard recovery process.  Something like below.
> 
> Thanks for the suggestion.  I experimented a little with this, but I
> think the problem I'm seeing is that (even if the application knows it
> should retry the snd_pcm_prepare() step) we still end up with an endpoint
> in EP_STATE_STOPPING while the corresponding stop_operating flag is 0.

Ah, I guess that's a fallout in the logic.  When XRUN happens at start
-- receiving an -EPIPE error at snd_pcm_do_start() -- then the patch
sets the XRUN state.  This assumed that the stream gets stopped the
following snd_pcm_undo_start() call.  Indeed it does stop but there we
forgot setting stop_operating flag unlike what snd_pcm_stop() does.

> This means that snd_pcm_sync_stop will never call the USB sync_stop
> handler, which AFAICS is the only way (?) the endpoint can get back to
> EP_STATE_STOPPED.
> 
> In my error case, the code in snd_pcm_sync_stop sets stop_operating to
> false (perhaps assuming that substream->ops->sync_stop will "succeed"
> in setting any STOPPING endpoints to STOPPED) but then this doesn't
> happen because of this xrun that arrives halfway through the sync_stop
> operation.
> 
> I experimented with removing the check at the top of snd_pcm_sync_stop,
> so that we enter the if body regardless of
> substream->runtime->stop_operating, and making my application retry
> snd_pcm_prepare() if it fails with -EPIPE, and this seems to "fix" my
> problem.  Obviously this causes more (unnecessary) calls to the
> sync_stop() entry point...
> 
> I'd be grateful of any thoughts you have about that.

How about the revised patch below?


Takashi

-- 8< --

--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1424,16 +1424,28 @@ static int snd_pcm_pre_start(struct snd_pcm_substream *substream,
 static int snd_pcm_do_start(struct snd_pcm_substream *substream,
 			    snd_pcm_state_t state)
 {
+	int err;
+
 	if (substream->runtime->trigger_master != substream)
 		return 0;
-	return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START);
+	err = substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START);
+	/* XRUN happened during the start; usually we do trigger(STOP)
+	 * then set the PCM state to XRUN, but in this case, the stream
+	 * is stopped in snd_pcm_undo_start() right after this point without
+	 * knowing the reason -- so set the PCM state beforehand as exception.
+	 */
+	if (err == -EPIPE)
+		__snd_pcm_set_state(substream->runtime, SNDRV_PCM_STATE_XRUN);
+	return err;
 }
 
 static void snd_pcm_undo_start(struct snd_pcm_substream *substream,
 			       snd_pcm_state_t state)
 {
-	if (substream->runtime->trigger_master == substream)
+	if (substream->runtime->trigger_master == substream) {
 		substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
+		substream->runtime->stop_operating = true;
+	}
 }
 
 static void snd_pcm_post_start(struct snd_pcm_substream *substream,