Am 09.01.24 um 01:00 schrieb Bjorn Helgaas:
On Sat, Jan 06, 2024 at 11:33:35PM +0100, Armin Wolf wrote:
Am 04.01.24 um 03:50 schrieb Athul Krishna:
Sent with Proton Mail secure email.
On Thursday, January 4th, 2024 at 1:05 AM, Armin Wolf <W_Armin@xxxxxx> wrote:
Am 03.01.24 um 19:51 schrieb Athul Krishna:
Hello,
This is my first time reporting an issue in the kernel.
Device Details:
* Asus Zephyrus G14 (||||||GA402RJ)
* Latest BIOS
* Arch_x86_64
* Kernel: 6.6.9
* Minimal install using archinstall
ISSUE: Using /dgpu_disable /provided by _asus-nb-wmi _to disable and
enable dedicated gpu,
causes system crash and reboots, randomly.
9/10 times writing 0 to dgpu_disable will produce an Input/Output
error, but the value will be changed to 0, half the time system will
crash and reboot. While writing 1 to it doesn't produce an error, I
have observed system crash twice just after that.
Steps to Reproduce:
* Remove dpgu: echo 1 | sudo tee ../remove (dgpu path)
* echo 1 | sudo tee /sys/devices/platform/asus-nb-wmi/dgpu_disable
* echo 0 | sudo tee /sys/devices/platform/asus-nb-wmi/dgpu_disable
* echo 1 | sudo tee /sys/bus/pci/rescan
After writing 0 to dgpu_disable, there's an entry in journal about an
ACPI bug.
Output of 'journalctl -p 3' and lspci is attached.
Hi,
Can you share the output of "acpidump" and the content of "/sys/bus/wmi/devices/05901221-D566-11D1-B2F0-00A0C9062910[-X]/bmof"?
The bmof files contain a description of the WMI interfaces of your machine, which might be important for diagnosing the error.
Thanks,
Armin Wolf
Here's the output of 'acpidump > acpidump.out' and 'cat /sys/bus/wmi/devices/05901221-D566-11D1-B2F0-00A0C9062910[-X]/bmof'
Ok, it seems the ACPI code tries to access an object ("GC00") which does not exist.
This is the reason why disabling the dGPU fails with -EIO.
I am unfortunately not that knowledgeable when it comes to PCI problems, i CCed the linux-pci mailing list in hope that
they can better help you in this regard.
FWIW, I don't know enough about what's going on here to see a PCI
connection. I do see a bunch of PCI-related stuff around rfkill, but
I don't think that's involved here.
I think the path here is this, which doesn't seem to touch anything in
PCI:
dgpu_disable_store
asus_wmi_set_devstate(ASUS_WMI_DEVID_DGPU, ..., &result)
asus_wmi_evaluate_method(ASUS_WMI_METHODID_DEVS, ...)
asus_wmi_evaluate_method3
wmi_evaluate_method(ASUS_WMI_MGMT_GUID, ...)
if (result > 1)
return -EIO
But if I missed it, let me know and I'll be happy to take another
look.
Bjorn
The issue happens when a PCI bus rescan is done after writing to "dgpu_disable".
As a side note a bugzilla bugreport for this issue was recently created:
https://bugzilla.kernel.org/show_bug.cgi?id=218354
Thanks,
Armin Wolf