Re: ERROR: Writing to dgpu_disable cause Input/Output error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 09, 2024 at 11:29:19AM +0100, Armin Wolf wrote:
> Am 09.01.24 um 01:00 schrieb Bjorn Helgaas:
> > On Sat, Jan 06, 2024 at 11:33:35PM +0100, Armin Wolf wrote:
> > > Am 04.01.24 um 03:50 schrieb Athul Krishna:
> > > > On Thursday, January 4th, 2024 at 1:05 AM, Armin Wolf <W_Armin@xxxxxx> wrote:
> > > > > Am 03.01.24 um 19:51 schrieb Athul Krishna:
> > > > > 
> > > > > > Hello,
> > > > > > This is my first time reporting an issue in the kernel.
> > > > > > 
> > > > > > Device Details:
> > > > > > 
> > > > > > * Asus Zephyrus G14 (||||||GA402RJ)
> > > > > > * Latest BIOS
> > > > > > * Arch_x86_64
> > > > > > * Kernel: 6.6.9
> > > > > > * Minimal install using archinstall
> > > > > > 
> > > > > > ISSUE: Using /dgpu_disable /provided by _asus-nb-wmi _to disable and
> > > > > > enable dedicated gpu,
> > > > > > causes system crash and reboots, randomly.
> > > > > > 9/10 times writing 0 to dgpu_disable will produce an Input/Output
> > > > > > error, but the value will be changed to 0, half the time system will
> > > > > > crash and reboot. While writing 1 to it doesn't produce an error, I
> > > > > > have observed system crash twice just after that.
> > > > > > 
> > > > > > Steps to Reproduce:
> > > > > > 
> > > > > > * Remove dpgu: echo 1 | sudo tee ../remove (dgpu path)
> > > > > > * echo 1 | sudo tee /sys/devices/platform/asus-nb-wmi/dgpu_disable
> > > > > > * echo 0 | sudo tee /sys/devices/platform/asus-nb-wmi/dgpu_disable
> > > > > > 
> > > > > > * echo 1 | sudo tee /sys/bus/pci/rescan
> > > > > > 
> > > > > > After writing 0 to dgpu_disable, there's an entry in journal about an
> > > > > > ACPI bug.
> > > > > > Output of 'journalctl -p 3' and lspci is attached.
> > > > > 
> > > > > Can you share the output of "acpidump" and the content of "/sys/bus/wmi/devices/05901221-D566-11D1-B2F0-00A0C9062910[-X]/bmof"?
> > > > > The bmof files contain a description of the WMI interfaces of your machine, which might be important for diagnosing the error.
> > > > > 
> > > > Here's the output of 'acpidump > acpidump.out' and 'cat /sys/bus/wmi/devices/05901221-D566-11D1-B2F0-00A0C9062910[-X]/bmof'
> > > Ok, it seems the ACPI code tries to access an object ("GC00") which does not exist.
> > > This is the reason why disabling the dGPU fails with -EIO.
> > > 
> > > I am unfortunately not that knowledgeable when it comes to PCI problems, i CCed the linux-pci mailing list in hope that
> > > they can better help you in this regard.
> >
> > FWIW, I don't know enough about what's going on here to see a PCI
> > connection.  I do see a bunch of PCI-related stuff around rfkill, but
> > I don't think that's involved here.
> > 
> > I think the path here is this, which doesn't seem to touch anything in
> > PCI:
> > 
> >    dgpu_disable_store
> >      asus_wmi_set_devstate(ASUS_WMI_DEVID_DGPU, ..., &result)
> >        asus_wmi_evaluate_method(ASUS_WMI_METHODID_DEVS, ...)
> >          asus_wmi_evaluate_method3
> >            wmi_evaluate_method(ASUS_WMI_MGMT_GUID, ...)
> >      if (result > 1)
> >        return -EIO
> 
> The issue happens when a PCI bus rescan is done after writing to "dgpu_disable".
> As a side note a bugzilla bugreport for this issue was recently created:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=218354

Ah, the original email talked about dgpu_disable causing Input/Output
errors and random crashes just after using dgpu_disable, so it wasn't
clear to me that the PCI rescan was related.

Athul, can you capture any information about the crash, e.g., an oops
or panic message?  Possibly a screenshot or video?

Booting with kernel parameters like "ignore_loglevel boot_delay=60
lpj=3200000" (might need tweaking and depends on
CONFIG_BOOT_PRINTK_DELAY) might be needed to slow things down enough
to capture.

Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux