On Mon, 28 Nov 2022 23:51:55 +0100, Carl Hetherington wrote: > > Hi Takashi, > > Thank you for your continued attention with this! > > [snip] > > > > I'll see if can apply a similar fix to this case, though to my naive > > > eyes it looks a little trickier as the xrun is found in the snd_pcm > > > code rather than the USB code. Any suggestions most welcome! > > > > OK, then it's a bit different problem, and not so trivial to fix in > > the kernel side alone, I'm afraid. Basically it's a race between > > start and stop of two streams. The key point is that, for stopping a > > (USB) stream, a sync-stop operation is needed, and this can't be > > performed at the PCM trigger itself (which is an atomic operation). > > So, the kernel trigger may at most return an error there. > > > > I assume that it's from snd_usb_endpoint_start() and it returning > > -EPIPE error. If so, we may change the PCM core code to set the PCM > > state again XRUN in such an error case, so that application may repeat > > the standard recovery process. Something like below. > > Thanks for the suggestion. I experimented a little with this, but I > think the problem I'm seeing is that (even if the application knows it > should retry the snd_pcm_prepare() step) we still end up with an endpoint > in EP_STATE_STOPPING while the corresponding stop_operating flag is 0. Ah, I guess that's a fallout in the logic. When XRUN happens at start -- receiving an -EPIPE error at snd_pcm_do_start() -- then the patch sets the XRUN state. This assumed that the stream gets stopped the following snd_pcm_undo_start() call. Indeed it does stop but there we forgot setting stop_operating flag unlike what snd_pcm_stop() does. > This means that snd_pcm_sync_stop will never call the USB sync_stop > handler, which AFAICS is the only way (?) the endpoint can get back to > EP_STATE_STOPPED. > > In my error case, the code in snd_pcm_sync_stop sets stop_operating to > false (perhaps assuming that substream->ops->sync_stop will "succeed" > in setting any STOPPING endpoints to STOPPED) but then this doesn't > happen because of this xrun that arrives halfway through the sync_stop > operation. > > I experimented with removing the check at the top of snd_pcm_sync_stop, > so that we enter the if body regardless of > substream->runtime->stop_operating, and making my application retry > snd_pcm_prepare() if it fails with -EPIPE, and this seems to "fix" my > problem. Obviously this causes more (unnecessary) calls to the > sync_stop() entry point... > > I'd be grateful of any thoughts you have about that. How about the revised patch below? Takashi -- 8< -- --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -1424,16 +1424,28 @@ static int snd_pcm_pre_start(struct snd_pcm_substream *substream, static int snd_pcm_do_start(struct snd_pcm_substream *substream, snd_pcm_state_t state) { + int err; + if (substream->runtime->trigger_master != substream) return 0; - return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START); + err = substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START); + /* XRUN happened during the start; usually we do trigger(STOP) + * then set the PCM state to XRUN, but in this case, the stream + * is stopped in snd_pcm_undo_start() right after this point without + * knowing the reason -- so set the PCM state beforehand as exception. + */ + if (err == -EPIPE) + __snd_pcm_set_state(substream->runtime, SNDRV_PCM_STATE_XRUN); + return err; } static void snd_pcm_undo_start(struct snd_pcm_substream *substream, snd_pcm_state_t state) { - if (substream->runtime->trigger_master == substream) + if (substream->runtime->trigger_master == substream) { substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP); + substream->runtime->stop_operating = true; + } } static void snd_pcm_post_start(struct snd_pcm_substream *substream,