On Wed, 10 Nov 2021 23:15:40 +0100, Kai Vehmanen wrote: > > Hey, > > On Wed, 10 Nov 2021, Takashi Iwai wrote: > > > On Wed, 10 Nov 2021 22:03:07 +0100, Kai Vehmanen wrote: > > > Fix a corner case between PCI device driver remove callback and > > > runtime PM idle callback. > [...] > > > Some non-persistent direct links showing the bug trigger on > > > different platforms with linux-next 20211109: > > > - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-tgl-1115g4/igt@i915_module_load@xxxxxxxxxxx > > > - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-jsl-1/igt@i915_module_load@xxxxxxxxxxx > > > > > > Notably with 20211110 linux-next, the bug does not trigger: > > > - https://intel-gfx-ci.01.org/tree/linux-next/next-20211110/fi-tgl-1115g4/igt@i915_module_load@xxxxxxxxxxx > > > > Is this the case with CONFIG_DEBUG_KOBJECT_RELEASE? > > This would be the only logical explanation I can think of for now. > > hmm, that doesn't seem to be used. Here's a link to kconfig used in the > failing CI run: > https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/kconfig.txt OK, then it's not due to the delayed release, but the cause should be the same, I suppose. > It's still a bit odd, especially given Scott just reported the other HDA > related regression in 5.15 today. The two issues don't seem to be related > though, although both are fixed by clearing drvdata (but in different > places of hda_intel.c). I don't think it's the same issue, rather a coincidence of the timing. There have been many changes in 5.15, after all :) > I'll try to run some more tests tomorrow. The fix should be good in any > case, but it would be interesting to understand better what change made > this more (?) likely to hit than before. This is not a new test and the > problem happens on fairly old platforms, so something has changed. A potential problem with the current code is that it doesn't disable the runtime PM at the release procedure. Could you try the patch below? You can put WARN_ON(!chip) at azx_runtime_idle(), too, for catching the invalid runtime call. thanks, Takashi --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1347,8 +1347,13 @@ static void azx_free(struct azx *chip) if (hda->freed) return; - if (azx_has_pm_runtime(chip) && chip->running) + if (azx_has_pm_runtime(chip) && chip->running) { pm_runtime_get_noresume(&pci->dev); + pm_runtime_forbid(&pci->dev); + pm_runtime_dont_use_autosuspend(&pci->dev); + pm_runtime_disable(&pci->dev); + } + chip->running = 0; azx_del_card_list(chip); @@ -2320,6 +2325,7 @@ static int azx_probe_continue(struct azx *chip) set_default_power_save(chip); if (azx_has_pm_runtime(chip)) { + pm_runtime_enable(&pci->dev); pm_runtime_use_autosuspend(&pci->dev); pm_runtime_allow(&pci->dev); pm_runtime_put_autosuspend(&pci->dev);