On Thu, 05 Sep 2024 09:44:10 +0200, Chancel Liu wrote: > > > > > > > Hi Takashi, > > > > > > Thanks for your reply and suggestions. Finally we have found the root cause. > > > Seems it's related to both drivers and alsa-lib. > > > > > > When two dmix clients run in parallel we get two direct dmix instances. > > > 1st dmix instance: > > > snd_pcm_dmix_open() > > > snd_pcm_direct_initialize_slave() > > > save_slave_setting() > > > Since the driver we are using has SND_PCM_INFO_RESUME flag, > > > dmix->spcm->info has this flag. Then this flag is cleared in > > dmix->shmptr->s.info. > > > > > > 2nd dmix instance: > > > snd_pcm_dmix_open() > > > snd_pcm_direct_open_secondary_client() > > > copy_slave_setting() > > > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn' > > > has this flag. > > > > > > If 1st dmix instance resumes firstly it should implement recovery of > > > slave pcm in snd_pcm_direct_slave_recover(). Because 1st > > > dmix->spcm->info has > > > SND_PCM_INFO_RESUME,snd_pcm_resume(direct->spcm) can be called > > > correctly to resume slave pcm. > > > > ... and immediately stop the stream, then prepare and restart as a usual > > restart. > > > > > However if 2nd dmix instance resumes firstly, > > > snd_pcm_resume(direct->spcm) will not be called because it's > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix instance > > > assumes someone else already did recovery so > > > snd_pcm_resume(direct->spcm) won't be called neither. In result the > > > slave pcm fails to resume. > > > > Something wrong happening here, then. > > > > In dmix, there is no hardware resume at all, but it's always a restart of the > > stream. The call of snd_pcm_resume() is only temporarily for inconsistencies > > that can be a problem on some drivers (IIRC dmaengine stuff). That said, > > dmix does a kind of fake resume, stops and restarts the stream cleanly on the > > first instance. On the second instance, it's already recovered, hence it bails > > out. > > > > If poll() hangs on the second instance, there can be some other problem. > > Maybe the resume -> stop -> restart sequence doesn't work with your driver > > well? > > > > Our dma driver will do PAUSE in system suspend and requires doing RESUME in > system resume. Current problem is that snd_pcm_resume() is not called by both > 1st instance and 2nd instance. That's weird. Are you really testing with the latest alsa-lib code? If application doesn't call snd_pcm_resume(), it means that the PCM state isn't set to SUSPENDED, so it pretends as if still running. Or if you mean that snd_pcm_resume() to the slave PCM isn't called (even though snd_pcm_resume() is called for the dmix PCM), check whether snd_pcm_direct_slave_recover() gets called, especially at the point: /* some buggy drivers require the device resumed before prepared; * when a device has RESUME flag and is in SUSPENDED state, resume * here but immediately drop to bring it to a sane active state. */ if (state == SND_PCM_STATE_SUSPENDED && (direct->spcm->info & SND_PCM_INFO_RESUME)) { snd_pcm_resume(direct->spcm); snd_pcm_drop(direct->spcm); snd_pcm_direct_timer_stop(direct); snd_pcm_direct_clear_timer_queue(direct); } Try to put debug prints or catch via breakpoint whether this code path is executed. Also, does the issue happen with the latest 6.11-rc kernel, too? If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the driver side? Does the problem persist, or it works? > > > SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my > > > opinion the first resumed dmix instance should make sure slave pcm can > > > be recovered properly no matter it's the first opened instance or > > > secondary opened instance > > . > > > > The snd_pcm_resume() gets called no matter which instance, just the first one > > who tries to recover the suspended state. (And it's called internally at > > updating the various state, not necessarily an explicit recovery call.) > > > > Unfortunately if secondary opened instance resumes first it doesn't has > SND_PCM_INFO_RESUME which causes snd_pcm_resume() never be called. No, it's misunderstanding. SND_PCM_INFO_RESUME isn't exposed to the application in the case of dmix at all; i.e. dmix doesn't support the full resume, per se. That's the design. So it doesn't matter which instance gets resumed at first. > > > Do you know why the secondary opened instance clear the > > > SND_PCM_INFO_RESUME flag? Can we do the following modification? > > > > > > diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c @@ -1183,8 > > > +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, > > snd_pcm_t *spcm) > > > COPY_SLAVE(buffer_time); > > > COPY_SLAVE(sample_bits); > > > COPY_SLAVE(frame_bits); > > > - > > > - dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME; > > > > I don't think so. The clearance of the RESUME flag here is correct. > > dmix doesn't support the hardware resume feature. It does its own. > > (And this flag is merely a info for apps, which isn't really evaluated except for > > the code in dmix workaround there.) > > > > > > Takashi > > > > I think dmix should know what state the real driver is. If driver requires that > app should do snd_pcm_resume() how can dmix get this information? The dmix already knows. But the PCM state exposed to applications isn't always tied as 1:1. Takashi