> > > > > > Hi Takashi, > > > > > > > > > > > > Thanks for your reply and suggestions. Finally we have found > > > > > > the root > > > cause. > > > > > > Seems it's related to both drivers and alsa-lib. > > > > > > > > > > > > When two dmix clients run in parallel we get two direct dmix > instances. > > > > > > 1st dmix instance: > > > > > > snd_pcm_dmix_open() > > > > > > snd_pcm_direct_initialize_slave() > > > > > > save_slave_setting() Since the driver we are > > > > > > using has SND_PCM_INFO_RESUME flag, > > > > > > dmix->spcm->info has this flag. Then this flag is cleared in > > > > > dmix->shmptr->s.info. > > > > > > > > > > > > 2nd dmix instance: > > > > > > snd_pcm_dmix_open() > > > > > > snd_pcm_direct_open_secondary_client() > > > > > > copy_slave_setting() 2nd dmix->spcm->info is > > > > > > copied from dmix->shmptr->s.info so it doesn' > > > > > > has this flag. > > > > > > > > > > > > If 1st dmix instance resumes firstly it should implement > > > > > > recovery of slave pcm in snd_pcm_direct_slave_recover(). > > > > > > Because 1st > > > > > > dmix->spcm->info has > > > > > > SND_PCM_INFO_RESUME,snd_pcm_resume(direct->spcm) can be > called > > > > > > correctly to resume slave pcm. > > > > > > > > > > ... and immediately stop the stream, then prepare and restart as > > > > > a usual restart. > > > > > > > > > > > However if 2nd dmix instance resumes firstly, > > > > > > snd_pcm_resume(direct->spcm) will not be called because it's > > > > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix > > > instance > > > > > > assumes someone else already did recovery so > > > > > > snd_pcm_resume(direct->spcm) won't be called neither. In > > > > > > result the slave pcm fails to resume. > > > > > > > > > > Something wrong happening here, then. > > > > > > > > > > In dmix, there is no hardware resume at all, but it's always a > > > > > restart of the stream. The call of snd_pcm_resume() is only > > > > > temporarily for > > > inconsistencies > > > > > that can be a problem on some drivers (IIRC dmaengine stuff). > > > > > That said, dmix does a kind of fake resume, stops and restarts > > > > > the stream cleanly on > > > the > > > > > first instance. On the second instance, it's already recovered, > > > > > hence it > > > bails > > > > > out. > > > > > > > > > > If poll() hangs on the second instance, there can be some other problem. > > > > > Maybe the resume -> stop -> restart sequence doesn't work with > > > > > your > > > driver > > > > > well? > > > > > > > > > > > > > Our dma driver will do PAUSE in system suspend and requires doing > > > > RESUME > > > in > > > > system resume. Current problem is that snd_pcm_resume() is not > > > > called by > > > both > > > > 1st instance and 2nd instance. > > > > > > That's weird. Are you really testing with the latest alsa-lib code? > > > > > > If application doesn't call snd_pcm_resume(), it means that the PCM > > > state isn't set to SUSPENDED, so it pretends as if still running. > > > > > > Or if you mean that snd_pcm_resume() to the slave PCM isn't called > > > (even though snd_pcm_resume() is called for the dmix PCM), check > > > whether snd_pcm_direct_slave_recover() gets called, especially at > > > the > > > point: > > > > > > /* some buggy drivers require the device resumed before > prepared; > > > * when a device has RESUME flag and is in SUSPENDED state, > > > resume > > > * here but immediately drop to bring it to a sane active state. > > > */ > > > if (state == SND_PCM_STATE_SUSPENDED && > > > (direct->spcm->info & SND_PCM_INFO_RESUME)) { > > > snd_pcm_resume(direct->spcm); > > > snd_pcm_drop(direct->spcm); > > > snd_pcm_direct_timer_stop(direct); > > > snd_pcm_direct_clear_timer_queue(direct); > > > } > > > > > > Try to put debug prints or catch via breakpoint whether this code > > > path is executed. > > > > > > Also, does the issue happen with the latest 6.11-rc kernel, too? > > > If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the > > > driver side? Does the problem persist, or it works? > > > > > > > I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I > > think and then I will try to switch on the latest version. > > > > Indeed I did some debug on this part. Please see my comments inline. > > > > int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct) { > > ... > > > > /* [Chancel] > > * When two dmix clients run in parallel we get two direct dmix > instances. > > * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd > dmix doesn't. > > OK, that must be the cause. It's because the second open copies the saved > shmem->s.info into spcm->info at its open time while we already dropped the > INFO_RESUME bit. All the rest behavior are side effect of this inconsistency. > > I guess dropping the INFO_RESUME bit at hw_params and hw_refine should > work instead. A totally untested fix is below. > > (And I believe the drop of INFO_PAUSE should be handled similarly, too, > instead of dropping spcm->info bit there.) > > > Takashi > > --- a/src/pcm/pcm_direct.c > +++ b/src/pcm/pcm_direct.c > @@ -1018,6 +1018,7 @@ int snd_pcm_direct_hw_refine(snd_pcm_t *pcm, > snd_pcm_hw_params_t *params) > } > dshare->timer_ticks = hw_param_interval(params, > SND_PCM_HW_PARAM_PERIOD_SIZE)->max / dshare->slave_period_size; > params->info = dshare->shmptr->s.info; > + params->info &= ~SND_PCM_INFO_RESUME; > #ifdef REFINE_DEBUG > snd_output_puts(log, "DMIX REFINE (end):\n"); > snd_pcm_hw_params_dump(params, log); @@ -1031,6 +1032,7 > @@ int snd_pcm_direct_hw_params(snd_pcm_t *pcm, > snd_pcm_hw_params_t * params) > snd_pcm_direct_t *dmix = pcm->private_data; > > params->info = dmix->shmptr->s.info; > + params->info &= ~SND_PCM_INFO_RESUME; > params->rate_num = dmix->shmptr->s.rate; > params->rate_den = 1; > params->fifo_size = 0; > @@ -1183,8 +1185,6 @@ static void save_slave_setting(snd_pcm_direct_t > *dmix, snd_pcm_t *spcm) > COPY_SLAVE(buffer_time); > COPY_SLAVE(sample_bits); > COPY_SLAVE(frame_bits); > - > - dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME; > } > > #undef COPY_SLAVE Thanks Takashi, This patch can fix this issue on my side. From my test both dmix1->spcm->info and dmix2->spcm->info has SND_PCM_INFO_RESUME flag and snd_pcm_resume() can be successfully called by first resumed instance. I don't understand this patch well. Are you meant to drop SND_PCM_INFO_RESUME from dmix and keep it in slave pcm? BTW, when will this patch merged to mainline? Regards, Chancel Liu