On 25/03/2019 09:59, Takashi Iwai wrote: > On Thu, 21 Mar 2019 15:28:23 +0100, > Jon Hunter wrote: >> >> >> On 20/03/2019 17:59, Jon Hunter wrote: >>> >>> On 20/03/2019 17:19, Takashi Iwai wrote: >>>> On Wed, 20 Mar 2019 17:04:14 +0100, >>>> Jon Hunter wrote: >>>>> >>>>> From: Jonathan Hunter <jonathanh@xxxxxxxxxx> >>>>> >>>>> This issue is not observed on the latest mainline kernel since the >>>>> 'ALSA: PCM suspend cleanup' series has been applied and the >>>>> snd_pcm_suspend_all() function call in the HDA codec driver has been >>>>> removed. However, this issue impacts stable kernel branches and so I >>>>> am trying to find a solution for these branches. >>>>> >>>>> When stress testing audio playback across multiple HDA streams >>>>> simultaneously, the following failure is sometimes observed when >>>>> starting playback ... >>>>> >>>>> Unable to set hw params for playback: File descriptor in bad state >>>>> >>>>> The problem is caused because ... >>>>> 1. Playback is starting for one HDA codec and so the chip->open_mutex >>>>> in azx_pcm_open() has been acquired. >>>>> 2. Playback for another HDA codec is starting but is blocked waiting to >>>>> acquire the chip->open_mutex in azx_pcm_open(). >>>>> 3. For the HDA codec that is blocked, its runtime-pm status is still >>>>> enabled from a previous playback session that has since completed and >>>>> been closed, however its autosuspend timeout has not expired yet. >>>>> 4. For the HDA codec that is blocked, the runtime-pm autosuspend timeout >>>>> now occurs calling the runtime-pm suspend callback and this calls >>>>> snd_pcm_suspend_all() placing all PCM streams in the suspended state. >>>>> 5. The block HDA codec is now unblocked and fails to set the HW params >>>>> because the PCM stream is in the suspended state. >>>>> >>>>> The above has been observed on Tegra platforms where the autosuspend >>>>> delay is set to 1 second and there is a delay to an AZX command. This >>>>> highlights that there is a window of time where autosuspend can place >>>>> the HDA PCM stream in the suspended state while opening the stream >>>>> and cause playback to fail. >>>>> >>>>> Looking at commit 3d21ef0b49f8 ('ALSA: pcm: Suspend streams globally via >>>>> device type PM ops') it appears that the call to snd_pcm_suspend_all() >>>>> has now been moved to the to the suspend handler for PCM streams and so >>>>> I am wondering if it would be equivalent to make the following change to >>>>> the HDA codec driver to achieve the same behaviour but only for the HDA >>>>> driver. So far the issue has not been observed with this change. >>>>> >>>>> Please note that I am not sending this as a formal patch as I wanted to >>>>> get some feedback/comments on the above. >>>> >>>> I guess just backporting 3d21ef0b49f8 should be OK even for older >>>> kernels. This assures the PCM stream gets suspended before entering >>>> any parent device suspend call. >>>> >>>> The remaining changes (except for the one for atiixp) are merely >>>> cleanup of superfluous calls, and keeping them are harmless. >>>> >>>> Could you check whether my theory above correct? >>> >>> I believe that you will need that change and 17bc4815de58 because we >>> need to prevent snd_pcm_suspend_all() being called in the pm-runtime >>> suspend callback. Applying only 3d21ef0b49f8 will means that at runtime >>> snd_pcm_suspend_all() can still be called from within the HDA codec >>> driver when the autosuspend timeout occurs. I will test this and >>> confirm. Maybe both could be backported? >> >> I confirmed today that with just 3d21ef0b49f8 the issue can occur, but >> with both 3d21ef0b49f8 and 17bc4815de58 the issue is not seen. >> >> Do you think that it would be appropriate to include these in the stable >> branches? They apply cleanly to v5.0, but we would probably need to >> back-port for early kernels such as v4.19, v4.9, etc. > > I reconsidered the problem again, and noticed that the very same > problem may appear with the system PM, not only with runtime PM. > The suspend can happen at any time, so even a stream in OPEN state may > go to suspend, and you'll hit the same problem. It's just a corner > case so no one really cared much. Yes I was not sure if it could also happen in the system suspend case or not. > So, I think the patch like below should fix the problem. > This can be easily backported to all stable trees, and it alone should > work without backporting the else intrusive changes. > > Could you check whether my theory is correct? Do you want me to test this on the same stable branch I have been testing with where commits 3d21ef0b49f8 and 17bc4815de58 are not present? Cheers Jon -- nvpublic