On Mon, 30 Sep 2024 at 08:43:46 +0200, Takashi Iwai wrote:
> On Sat, 21 Sep 2024 10:10:37 +0200,
> Maxim Mikityanskiy wrote:
> >
> > On Sat, 07 Sep 2024 at 20:10:44 +0200, Takashi Iwai wrote:
> > > On Sat, 07 Sep 2024 19:24:41 +0200,
> > > Maxim Mikityanskiy wrote:
> > > >
> > > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote:
> > > > > On Sat, 07 Sep 2024 15:45:41 +0200,
> > > > > Maxim Mikityanskiy wrote:
> > > > > >
> > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote:
> > > > > > > On Fri, 06 Sep 2024 20:05:06 +0200,
> > > > > > > Maxim Mikityanskiy wrote:
> > > > > > > >
> > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote:
> > > > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200,
> > > > > > > > > Maxim Mikityanskiy wrote:
> > > > > > > > > >
> > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on
> > > > > > > >
> > > > > > > > Spotted a typo: s/520M/540M/
> > > > > > > >
> > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules
> > > > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends
> > > > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the
> > > > > > > > > > runtime PM point of view, the device remains active, because the PCI
> > > > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents
> > > > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs
> > > > > > > > > > power management for the video and audio devices).
> > > > > > > > > >
> > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free().
> > > > > > > > > >
> > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller")
> > > > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@xxxxxxxxx>
> > > > > > > > >
> > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to
> > > > > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the
> > > > > > > > > runtime-suspended state?
> > > > > > > >
> > > > > > > > I added the following:
> > > > > > > >
> > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) },
> > > > > > > >
> > > > > > > > The probe was apparently skipped (the device is not attached to a
> > > > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to
> > > > > > > > DynOff.
> > > > > > >
> > > > > > > OK, that's good.
> > > > > > >
> > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M
> > > > > > > > entirely, as there might be other laptops with this GPU that have sound,
> > > > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs.
> > > > > > >
> > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the
> > > > > > > above, so that it's picked up only for yours.
> > > > > >
> > > > > > Where can I get it?
> > > > > >
> > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor
> > > > > > 0x0000
> > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device
> > > > > > 0x0000
> > > > >
> > > > > Ouch, Lenovo didn't set it right.
> > > > > Alternatively we may introduce a deny list based on DMI. Hmm...
> > > >
> > > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would
> > > > also allow handling alternative GPUs on this laptop model.
> > > >
> > > > But could you explain what's wrong with a generic approach that I
> > > > suggest (probe_continue failed => mark as suspended)? It doesn't require
> > > > any lists. Any drawbacks?
> > >
> > > As you noticed, it will leave the device bound with driver, i.e. this
> > > looks as if it were operative. It means that the sound driver is
> > > still responsible for suspend/resume or whatever action even though
> > > it's totally useless. In that sense, it makes more sense to give it
> > > away at the early stage before actually binding it, and the deny list
> > > is exactly for that.
> >
> > Thanks for your feedback!
>
> Sorry for the late reply, as I've been off in the last weeks.
>
> > Do you happen to know whether there is a way for a driver to unbind
> > itself after probe() returned success? I've never seen anything like
> > this, but it would be a perfect solution: no lists and no driver bound
> > after the delayed initialization fails.
>
> I don't think it's possible for now (at least in a clean way).
> It should be a call of device_driver_detach(), but it's no exported
> API.
>
> > > Apart from that, the suggested change itself can be applied
> > > independently from the deny list update, too. It'd be good for other
> > > similar devices, too.
> >
> > I was trying to avoid lists, because it's a maintainance nightmare (the
> > lists are never complete, there might be false positives, etc.), and
> > we'd need to add another type of a list for DMI quirks. Moreover, adding
> > both my change and a list entry has a drawback that similarly broken
> > devices will use different workaround code (i.e. less robust), and the
> > ones that use the generic fallback will never get on the list, because
> > no one will report it.
> >
> > As a maintainer, what's your preferred approach?
> >
> > 1. Unbind ourselves on failure in azx_probe_continue() (but I don't know
> > whether it's possible).
>
> So it's no easy way.
>
> > 2. Add a DMI deny list to hda_intel.
> >
> > 3. Add a DMI check to quirk_nvidia_hda (i.e. don't enable a non-existent
> > HDA if it's not enabled by BIOS on this laptop model).
>
> I'm afraid that it may break other systems; IIRC, the power up was
> needed explicitly even if it's disabled on BIOS.
I'm not suggesting full removal of quirk_nvidia_hda. If we add a DMI
check to disable quirk_nvidia_hda on this particular laptop model, it
can't break other systems, can it?
Anyways, adding a DMI check to hda_intel or quirk_nvidia_hda seems the
same level of efforts, so I can proceed with the former. It's the same
power consumption either way, right?
I'm also thinking of adding a module parameter to block probing of the
DGPU audio. Back in the days, there were plenty of similar Lenovo
laptops, which might also suffer from the same issue.
> > 4. Driver becomes a stub no failure in azx_probe_continue() (this
> > patch).
>
> It's OK, but slightly risky. It's kept bound, hence any unexpected
> thing might happen if the code is changed without noticing it.
>
> > 5. Some combo of 4 and 2/3?
>
> Not safe.
>
> So, from the safety POV, I'd rather take the DMI deny list approach.
>
>
> thanks,
>
> Takashi
>
> > > Though, a slight concern is the sequence of runtime PM you applied in
> > > the patch. Is it a standard idiom to perform pm_runtime_disable(),
> > > set_suspended() and enable()?
> >
> > I'm not an expert in PM. Before I wrote this code, I was meditating on
> > the docs for a few hours, and what I understood was:
> >
> > 1. As we are disabling the device (below in the same function), we need
> > set_suspended() to just update the state.
> >
> > 2. set_suspended() must be called with PM disabled (see the comment
> > above the function).
> >
> > 3. Disable and enable must come in pairs, because there is refcounting.
> >
> > I've seen set_suspended() after disable() on device remove (e.g.,
> > rt9120_remove()). For hda_intel, assuming that disable/enable are
> > already paired, adding a new disable without a matching enable would
> > break refcounting. Hence such a sequence.
> >
> > I'd love to hear an opinion from someone more experienced in PM.
> >
> > > Also, azx_free() is the common
> > > destructor, hence it's called also at the regular driver unbinding.
> > > I'm not entirely sure whether it's OK to call there also at
> > > unbinding.
> >
> > I don't have hardware to test this flow, but theoretically it looks OK
> > to me: we set the suspended state as we disable the device.
> >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> > > >
> > > > >
> > > > >
> > > > > Takashi
> > > > >
> > > > >
> > > > > >
> > > > > > Is it not the right place?
> > > > > >
> > > > > > >
> > > > > > > Takashi
> > > > > > >
> > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda,
> > > > > > > > although I don't know whether it can be done without recompiling the
> > > > > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus.
> > > > > > > >
> > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it
> > > > > > > > stays in DynPwr if the screen is on.)
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > thanks,
> > > > > > > > >
> > > > > > > > > Takashi
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++-
> > > > > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > > > > > > > > index b79020adce63..65fcb92e11c7 100644
> > > > > > > > > > --- a/sound/pci/hda/hda_intel.c
> > > > > > > > > > +++ b/sound/pci/hda/hda_intel.c
> > > > > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip)
> > > > > > > > > > if (use_vga_switcheroo(hda)) {
> > > > > > > > > > if (chip->disabled && hda->probe_continued)
> > > > > > > > > > snd_hda_unlock_devices(&chip->bus);
> > > > > > > > > > - if (hda->vga_switcheroo_registered)
> > > > > > > > > > + if (hda->vga_switcheroo_registered) {
> > > > > > > > > > vga_switcheroo_unregister_client(chip->pci);
> > > > > > > > > > +
> > > > > > > > > > + /* Some GPUs don't have sound, and azx_first_init fails,
> > > > > > > > > > + * leaving the device probed but non-functional. As long
> > > > > > > > > > + * as it's probed, the PCI subsystem keeps its runtime
> > > > > > > > > > + * PM status as active. Force it to suspended (as we
> > > > > > > > > > + * actually stop the chip) to allow GPU to suspend via
> > > > > > > > > > + * vga_switcheroo.
> > > > > > > > > > + */
> > > > > > > > > > + pm_runtime_disable(&pci->dev);
> > > > > > > > > > + pm_runtime_set_suspended(&pci->dev);
> > > > > > > > > > + pm_runtime_enable(&pci->dev);
> > > > > > > > > > + }
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > if (bus->chip_init) {
> > > > > > > > > > --
> > > > > > > > > > 2.46.0
> > > > > > > > > >
[Index of Archives]
[Pulseaudio]
[Linux Audio Users]
[ALSA Devel]
[Fedora Desktop]
[Fedora SELinux]
[Big List of Linux Books]
[Yosemite News]
[KDE Users]