Re: [alsa-devel] [PATCH] ALSA: hda: Use standard waitqueue for RIRB wakeup

Jon Hunter <jonathanh@xxxxxxxxxx> · Thu, 19 Dec 2019 12:15:48 +0000

On 18/12/2019 14:31, Takashi Iwai wrote:
> On Wed, 18 Dec 2019 15:17:27 +0100,
> Jon Hunter wrote:
>>
>>
>> On 10/12/2019 14:57, Takashi Iwai wrote:
>>> The HD-audio CORB/RIRB communication was programmed in a way that was
>>> documented in the reference in decades ago, which is essentially a
>>> polling in the waiter side.  It's working fine but costs CPU cycles on
>>> some platforms that support only slow communications.  Also, for some
>>> platforms that had unreliable communications, we put longer wait time
>>> (2 ms), which accumulate quite long time if you execute many verbs in
>>> a shot (e.g. at the initialization or resume phase).
>>>
>>> This patch attempts to improve the situation by introducing the
>>> standard waitqueue in the RIRB waiter side instead of polling.  The
>>> test results on my machine show significant improvements.  The time
>>> spent for "cat /proc/asound/card*/codec#*" were changed like:
>>>
>>> * Intel SKL + Realtek codec
>>>   before the patch:
>>>    0.00user 0.04system 0:00.10elapsed 40.0%CPU
>>>   after the patch:
>>>    0.00user 0.01system 0:00.10elapsed 10.0%CPU
>>>
>>> * Nvidia GP107GL + Nvidia HDMI codec
>>>   before the patch:
>>>    0.00user 0.00system 0:02.76elapsed 0.0%CPU
>>>   after the patch:
>>>    0.00user 0.00system 0:00.01elapsed 17.0%CPU
>>>
>>> So, for Intel chips, the total time is same, while the total time is
>>> greatly reduced (from 2.76 to 0.01s) for Nvidia chips.
>>> The only negative data here is the increase of CPU time for Nvidia,
>>> but this is the unavoidable cost for faster wakeups, supposedly.
>>>
>>> Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
>> Starting with next-20191217 I am seeing the following error on one of
>> our Tegra platforms ...
>>
>> tegra-hda 3510000.hda: azx_get_response timeout, switching to polling
>> mode: last cmd=0x404f2d00
>>
>> Bisect is point to this commit and although it does not cleanly revert,
>> if I revert this and a couple dependencies on top of -next the issue
>> goes away. Any thoughts on what could be going on here?
> 
> Do you see any bad behavior other than the warning message?

I have done some more local testing and so far I don't see any bad
behaviour just the warning.

> If you don't see any dysfunction, I guess that the difference is that
> the old code went to the trial mode at first silently (with
> dev_dbg()), then switching to polling mode at next.  The trial mode is
> basically same as polling mode, but it was just considered to be a
> temporary transition, so not warned.
> 
> IOW, if my guess is correct, maybe Tegra never worked in the normal
> mode but only in the polling mode (but without complaints).
> If so, the patch like below would be needed.
> 
> To prove my theory, could you check the old code with dyndbg enabled
> for sound/pci/hda/hda_controller.c?  If a message like below appears,
> it's the case:
>   azx_get_response timeout, polling the codec once: last cmd=xxx

Yes I tried this and you are correct, this does appear even if v5.5-rc2.

Please note that this timeout is intermittent and so does not always
happen. So it appears to work, but sometimes it can fail.

> --- a/sound/pci/hda/hda_tegra.c
> +++ b/sound/pci/hda/hda_tegra.c
> @@ -394,6 +394,7 @@ static int hda_tegra_create(struct snd_card *card,
>  	if (err < 0)
>  		return err;
>  
> +	chip->bus.core.polling = 1;
>  	chip->bus.core.needs_damn_long_delay = 1;
>  
>  	err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);

I don't think we want to do this, because so far this is only seen on
one Tegra device and this enable polling for all.

For now you can ignore this report and we will investigate what is
happening on Tegra194 to cause this.

Thanks
Jon

-- 
nvpublic