Re: Issue in pcm_dsnoop.c in alsa-lib

Takashi Iwai <tiwai@xxxxxxx> · Thu, 10 Mar 2022 09:27:31 +0100

On Thu, 10 Mar 2022 03:25:27 +0100,
S.J. Wang wrote:
> 
> Hi
> 
> > >
> > > > >
> > > > > Hi Takashi Iwai, Jaroslav Kysela
> > > > >
> > > > >     We encountered an issue in the pcm_dsnoop use case, could you
> > > > > please help to have a look?
> > > > >
> > > > >     *Issue description:*
> > > > >     With two instances for dsnoop type device running in parallel,
> > > > > after suspend/resume,  one of the instances will be hung in memcpy
> > > > > because the very large copy size is obtained.
> > > > >
> > > > > #3 0x0000ffffa78d5098 in snd_pcm_dsnoop_sync_ptr
> > > > (pcm=0xaaab06563da0)
> > > > > at pcm_dsnoop.c:158 dsnoop = 0xaaab06563c20 slave_hw_ptr = 64
> > > > > old_slave_hw_ptr = 533120 avail = *187651522444320*
> > > > >
> > > > >   * Reason analysis: *
> > > > >    The root cause that I analysis is that after suspend/resume,
> > > > > one instance will get the SND_PCM_STATE_SUSPENDED state from slave
> > > > > pcm
> > > > device,
> > > > >   then it will do snd_pcm_prepare() and snd_pcm_start(),  which
> > > > > will reset the dsnoop->slave_hw_ptr and the hw_ptr of slave pcm
> > > > > device, then the state of this instance is correct.  But another
> > > > > instance may not get the SND_PCM_STATE_SUSPENDED state from
> > slave
> > > > > pcm device because slave device may have been recovered by first
> > > > > instance,  so the dsnoop->slave_hw_ptr is not reset.  but because
> > > > > hw_ptr of slave pcm device has been reset,  so there will be a very large
> > "avail" size.
> > > > >
> > > > >    *Solution:*
> > > > >    I didn't come up with a fix for this issue,  seems there is no
> > > > > easy way to let another instance know this case and reset the
> > > > > dsnoop->slave_hw_ptr,  could you please help?
> > > >
> > > > Could you try topic/pcm-direct-resume branch on
> > > >
> > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > > thub
> > > > .com%2Ftiwai%2Falsa-
> > > >
> > lib&amp;data=04%7C01%7Cshengjiu.wang%40nxp.com%7C95f97de3f2c840d
> > > >
> > 9853508d9fd2e79ea%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C
> > > >
> > 637819198319430045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > > >
> > DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdat
> > > >
> > a=WWX1ZlcQhJF3pHJdHPIH%2B0xG9o%2FjQnHG5fHDbKXwQwE%3D&amp;r
> > > > eserved=0
> > > >
> > >
> > > Thanks,  I push my test result in
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > > ub.com%2Falsa-project%2Falsa-
> > lib%2Fissues%2F213&amp;data=04%7C01%7Cshe
> > >
> > ngjiu.wang%40nxp.com%7Cf71e70640d1b40b66be508d9fdbb2ac2%7C686ea
> > 1d3bc2b
> > >
> > 4c6fa92cd99c5c301635%7C0%7C0%7C637819802581943763%7CUnknown%7
> > CTWFpbGZs
> > >
> > b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> > 0%3D
> > > %7C3000&amp;sdata=fZ2ogNj2RDTv4DV8vgB71M2m0XtU8UhMiXEV1%2Bl
> > wUrQ%3D&amp
> > > ;reserved=0
> > > Could you please review?
> > 
> > Please keep the discussion on ML.
> > 
> 
> I saw you have update the origin/topic/pcm-direct-resume branch, I test your 
> latest change, it is more stable than before, but still meet once of the issue after
> overnight test, it it very very low possibility.
> 
> So I suggest if we need to do below change, shall we?

Point taken.  The xrun/suspend check should be right before the slave
hwptr update, yes.

I updated the git repo again.  Will submit the patch set for the merge
as the final version.

thanks,

Takashi