On 4/12/21 12:36 PM, Roy Spliet wrote:
Hello Aaron,
Thanks for your insights. A follow-up query and some observations
in-line.
Op 12-04-2021 om 20:06 schreef Aaron Plattner:
On 4/10/21 1:48 PM, Roy Spliet wrote:
Op 10-04-2021 om 20:23 schreef Lukas Wunner:
On Sat, Apr 10, 2021 at 04:51:27PM +0100, Roy Spliet wrote:
Can I ask someone with more
technical knowledge of snd_hda_intel and vgaswitcheroo to
brainstorm about
the possible challenges of nouveau taking matters into its own
hand rather
than keeping this PCI quirk around?
It sounds to me like the HDA is not powered if no cable is plugged in.
What is reponsible then for powering it up or down, firmware code on
the GPU or in the host's BIOS?
Sometimes the BIOS, but definitely unconditionally the PCI quirk
code:
https://github.com/torvalds/linux/blob/master/drivers/pci/quirks.c#L5289
(CC Aaron Plattner)
My basic understanding is that the audio function stops responding
whenever the graphics function is powered off. So the requirement
here is that the audio driver can't try to talk to the audio function
while the graphics function is asleep, and must trigger a graphics
function wakeup before trying to communicate with the audio function.
I believe that vgaswitcheroo takes care of this for us.
I think there are also requirements about the audio function needing
to be awake when the graphics driver is updating the ELD, but I'm not
sure.
This is harder on Windows because the audio driver lives in its own
little world doing its own thing but on Linux we can do better.
Ideally, we should try to find out how to control HDA power from the
operating system rather than trying to cooperate with whatever
firmware
is doing. If we have that capability, the OS should power the HDA up
and down as it sees fit.
After system boot, I don't think there's any firmware involved, but
I'm not super familiar with the low-level details and it's possible
the situation changed since I last looked at it.
I think the problem with having nouveau write this quirk is that the
kernel will need to re-probe the PCI device to notice that it has
suddenly become a multi-function device with an audio function, and
hotplug the audio driver. I originally looked into trying to do that
but it was tricky because the PCI subsystem didn't really have a
mechanism for a single-function device to become a multi-function
device on the fly and it seemed easier to enable it early on during
bus enumeration. That way the kernel sees both functions all the time
without anything else having to be special about this configuration.
Right, so for a little more context: a while ago I noticed that my
laptop (lucky me, Asus K501UB) has a 940M with HDA but no codec. Seems
legit, given how this GPU has no displays attached; they're all hooked
up to the Intel integrated GPU. That threw off the snd_hda_intel
mid-probe, and as a result didn't permit runpm, keeping the entire
GPU, PCIe bus and thus the CPU package awake. A bit of hackerly later
we decided to continue probing without a codec, and now my laptop is
happy, but...
What is the PCI class of the GPU in your system? If it has no display
outputs it's probably 0x302 ("3D Controller") rather than 0x300 ("VGA
Controller"). Looking at the code it looks like this workaround is being
applied to both but maybe it should be restricted to just VGA controllers.
-- Aaron
A new problem popped up with several other NVIDIA GPUs that expose
their HDA subdevice, but somehow its inaccessible. Relevant lines from
a users' log:
[ 3.031222] MXM: GUID detected in BIOS
[ 3.031280] ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index
(0x000000003) is beyond end of object (length 0x0)
(20200925/exoparg2-393)
[ 3.031352] ACPI Error: Aborting method \_SB.PCI0.GFX0._DSM due to
previous error (AE_AML_PACKAGE_LIMIT) (20200925/psparse-529)
[ 3.031419] ACPI: \_SB_.PCI0.GFX0: failed to evaluate _DSM (0x300b)
[ 3.031424] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type
mismatch - Found [Buffer], ACPI requires [Package]
(20200925/nsarguments-61)
[ 3.031619] pci 0000:00:02.0: optimus capabilities: enabled, status
dynamic power,
[ 3.031667] ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index
(0x000000003) is beyond end of object (length 0x0)
(20200925/exoparg2-393)
[ 3.031731] ACPI Error: Aborting method \_SB.PCI0.GFX0._DSM due to
previous error (AE_AML_PACKAGE_LIMIT) (20200925/psparse-529)
[ 3.031791] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP._DSM
due to previous error (AE_AML_PACKAGE_LIMIT) (20200925/psparse-529)
[ 3.031856] ACPI: \_SB_.PCI0.PEG0.PEGP: failed to evaluate _DSM
(0x300b)
[ 3.031859] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4
type mismatch - Found [Buffer], ACPI requires [Package]
(20200925/nsarguments-61)
[ 3.032058] pci 0000:01:00.0: optimus capabilities: enabled, status
dynamic power,
[ 3.032061] VGA switcheroo: detected Optimus DSM method
\_SB_.PCI0.PEG0.PEGP handle
[ 3.032323] checking generic (d0000000 410000) vs hw (f6000000
1000000)
[ 3.032325] checking generic (d0000000 410000) vs hw (e0000000
10000000)
[ 3.032326] checking generic (d0000000 410000) vs hw (f0000000
2000000)
[ 3.032410] nouveau 0000:01:00.0: NVIDIA GK107 (0e71f0a2)
[ 3.042385] nouveau 0000:01:00.0: bios: version 80.07.a0.00.11
--- snip ---
[ 8.951478] snd_hda_intel 0000:01:00.1: can't change power state
from D3cold to D0 (config space inaccessible)
[ 8.951509] snd_hda_intel 0000:01:00.1: can't change power state
from D3hot to D0 (config space inaccessible)
[ 8.951608] snd_hda_intel 0000:01:00.1: Disabling MSI
[ 8.951621] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio
client
[ 8.952461] snd_hda_intel 0000:00:1b.0: bound 0000:00:02.0 (ops
i915_audio_component_bind_ops [i915])
[ 8.952642] snd_hda_intel 0000:01:00.1: number of I/O streams is
30, forcing separate stream tags
Now I don't know what's going on, but the snd_hda_intel messages are
ominous. And so are the ACPI warnings. But I don't know how much these
two are related.
You say that it is desirable to switch on HDA at boot-time because the
PCI subsystem doesn't play nicely with changing a device to
multi-function. That rules out the option of only enabling the HDA
device once a cable is plugged in. Are there any other trap doors that
snd_hda_intel needs to navigate around to make this work fault free on
all hardware, such as:
- Codecs not revealing themselves until a display is plugged in,
requiring perhaps a "codec reprobe" and "codec remove" event from
nouveau/rm to snd_hda_intel,
- Borked BIOSes just blindly assigning the MMIO space of the HDA
device to another device, or nothing at all,
- ... other things that might give any of us nightmares and heart burn?
Thanks!
Roy
-- Aaron
Thanks,
Lukas