> > > > Hi Takashi, > > > > > > > > Thanks for your reply and suggestions. Finally we have found the root > cause. > > > > Seems it's related to both drivers and alsa-lib. > > > > > > > > When two dmix clients run in parallel we get two direct dmix instances. > > > > 1st dmix instance: > > > > snd_pcm_dmix_open() > > > > snd_pcm_direct_initialize_slave() > > > > save_slave_setting() > > > > Since the driver we are using has SND_PCM_INFO_RESUME flag, > > > > dmix->spcm->info has this flag. Then this flag is cleared in > > > dmix->shmptr->s.info. > > > > > > > > 2nd dmix instance: > > > > snd_pcm_dmix_open() > > > > snd_pcm_direct_open_secondary_client() > > > > copy_slave_setting() > > > > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn' > > > > has this flag. > > > > > > > > If 1st dmix instance resumes firstly it should implement recovery of > > > > slave pcm in snd_pcm_direct_slave_recover(). Because 1st > > > > dmix->spcm->info has > > > > SND_PCM_INFO_RESUME,snd_pcm_resume(direct->spcm) can be called > > > > correctly to resume slave pcm. > > > > > > ... and immediately stop the stream, then prepare and restart as a usual > > > restart. > > > > > > > However if 2nd dmix instance resumes firstly, > > > > snd_pcm_resume(direct->spcm) will not be called because it's > > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix > instance > > > > assumes someone else already did recovery so > > > > snd_pcm_resume(direct->spcm) won't be called neither. In result the > > > > slave pcm fails to resume. > > > > > > Something wrong happening here, then. > > > > > > In dmix, there is no hardware resume at all, but it's always a restart of the > > > stream. The call of snd_pcm_resume() is only temporarily for > inconsistencies > > > that can be a problem on some drivers (IIRC dmaengine stuff). That said, > > > dmix does a kind of fake resume, stops and restarts the stream cleanly on > the > > > first instance. On the second instance, it's already recovered, hence it > bails > > > out. > > > > > > If poll() hangs on the second instance, there can be some other problem. > > > Maybe the resume -> stop -> restart sequence doesn't work with your > driver > > > well? > > > > > > > Our dma driver will do PAUSE in system suspend and requires doing RESUME > in > > system resume. Current problem is that snd_pcm_resume() is not called by > both > > 1st instance and 2nd instance. > > That's weird. Are you really testing with the latest alsa-lib code? > > If application doesn't call snd_pcm_resume(), it means that the PCM > state isn't set to SUSPENDED, so it pretends as if still running. > > Or if you mean that snd_pcm_resume() to the slave PCM isn't called > (even though snd_pcm_resume() is called for the dmix PCM), check > whether snd_pcm_direct_slave_recover() gets called, especially at the > point: > > /* some buggy drivers require the device resumed before prepared; > * when a device has RESUME flag and is in SUSPENDED state, > resume > * here but immediately drop to bring it to a sane active state. > */ > if (state == SND_PCM_STATE_SUSPENDED && > (direct->spcm->info & SND_PCM_INFO_RESUME)) { > snd_pcm_resume(direct->spcm); > snd_pcm_drop(direct->spcm); > snd_pcm_direct_timer_stop(direct); > snd_pcm_direct_clear_timer_queue(direct); > } > > Try to put debug prints or catch via breakpoint whether this code path > is executed. > > Also, does the issue happen with the latest 6.11-rc kernel, too? > If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the driver > side? Does the problem persist, or it works? > I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I think and then I will try to switch on the latest version. Indeed I did some debug on this part. Please see my comments inline. int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct) { ... /* [Chancel] * When two dmix clients run in parallel we get two direct dmix instances. * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd dmix doesn't. * Let's name 1st opened dmix "dmix1" and 2nd dmix "dmix2". * After resume, both dmix1 and dmix2 enter into snd_pcm_direct_slave_recover(). * Here we assume dmix2 is the earlier instance which execute here. * dmix2 successfully get semaphore lock and dmix1 is waiting for this lock. */ semerr = snd_pcm_direct_semaphore_down(direct, DIRECT_IPC_SEM_CLIENT); ... state = snd_pcm_state(direct->spcm); if (state != SND_PCM_STATE_XRUN && state != SND_PCM_STATE_SUSPENDED) { /* [Chancel] * dmix2 finds spcm state is SUSPENDED so it will not enter here. * However later when dmix1 get lock and enter here, spcm state has been changed to RUNNING by dmix2. * In result dmix1 assumes some other instance has done so dmix2 directly return. * snd_pcm_resume() is not called by dmix1. */ /* ignore... someone else already did recovery */ semerr = snd_pcm_direct_semaphore_up(direct, DIRECT_IPC_SEM_CLIENT); if (semerr < 0) { SNDERR("SEMUP FAILED with err %d", semerr); return semerr; } return 0; } ... if (state == SND_PCM_STATE_SUSPENDED && (direct->spcm->info & SND_PCM_INFO_RESUME)) { /* [Chancel] * dmix2->spcm->info doesn't have SND_PCM_INFO_RESUME flag. So this condition is not met. * snd_pcm_resume() is not called by dmix2. */ snd_pcm_resume(direct->spcm); snd_pcm_drop(direct->spcm); snd_pcm_direct_timer_stop(direct); snd_pcm_direct_clear_timer_queue(direct); } ... ret = snd_pcm_prepare(direct->spcm); ... /* [Chancel] * dmix2 calls snd_pcm_start to set spcm state to RUNNING. */ ret = snd_pcm_start(direct->spcm); ... } The dma driver I'm using supports pause/resume function. I don't think dropping SNDRV_PCM_INFO_RESUME is a good fix on this issue. Besides this driver, I also validate on another driver whose dma doesn't has such flag. This issue has gone and both 2 instances work well with suspend/resume. Regards, Chancel Liu > > > > SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In > my > > > > opinion the first resumed dmix instance should make sure slave pcm can > > > > be recovered properly no matter it's the first opened instance or > > > > secondary opened instance > > > . > > > > > > The snd_pcm_resume() gets called no matter which instance, just the first > one > > > who tries to recover the suspended state. (And it's called internally at > > > updating the various state, not necessarily an explicit recovery call.) > > > > > > > Unfortunately if secondary opened instance resumes first it doesn't has > > SND_PCM_INFO_RESUME which causes snd_pcm_resume() never be called. > > No, it's misunderstanding. SND_PCM_INFO_RESUME isn't exposed to the > application in the case of dmix at all; i.e. dmix doesn't support the > full resume, per se. That's the design. So it doesn't matter which > instance gets resumed at first. > > > > > Do you know why the secondary opened instance clear the > > > > SND_PCM_INFO_RESUME flag? Can we do the following modification? > > > > > > > > diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c @@ -1183,8 > > > > +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, > > > snd_pcm_t *spcm) > > > > COPY_SLAVE(buffer_time); > > > > COPY_SLAVE(sample_bits); > > > > COPY_SLAVE(frame_bits); > > > > - > > > > - dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME; > > > > > > I don't think so. The clearance of the RESUME flag here is correct. > > > dmix doesn't support the hardware resume feature. It does its own. > > > (And this flag is merely a info for apps, which isn't really evaluated except > for > > > the code in dmix workaround there.) > > > > > > > > > Takashi > > > > > > > I think dmix should know what state the real driver is. If driver requires that > > app should do snd_pcm_resume() how can dmix get this information? > > The dmix already knows. But the PCM state exposed to applications > isn't always tied as 1:1. > > > Takashi