Hi, On 8/8/24 7:22 PM, Rafael J. Wysocki wrote: > Hi, > > On Mon, Aug 5, 2024 at 2:47 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: >> >> Hi, >> >> On 8/5/24 1:28 PM, Hans de Goede wrote: >>> Hi, >>> >>> On 8/1/24 4:28 PM, Hans de Goede wrote: >>>> Hi, >>>> >>>> On 7/29/24 1:15 PM, Hans de Goede wrote: >>>>> Hi Rafael, >>>>> >>>>> There are 2 bug reports: >>>>> >>>>> 1. Brightness up/down key-presses no longer working on LG laptop (acpi-video related): >>>>> https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/thread/V2KWAGZIAX4TOWPCH6A6FSIT66PR3KMZ/ >>>> >>>> I have filed: >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=2302253 >>>> >>>> to track this now and an acpidump of the troublesome LG laptop >>>> is attached there. I have also requested dmesg output of >>>> a non working kernel to be attached there. >>>> >>>> As a reminder this is the bug where it has been confirmed that >>>> reverting "EC: Install address space handler at the namespace root" >>>> helps, with the caveat that there is a Thunderbolt related IRQ >>>> storm on the ACPI event IRQ after the revert ... >>> >>> Ok, so the bugzilla now has 2 different dmesg outputs: >>> >>> 1. 6.9.6, this kernel works without problems >>> >>> 2. 6.9.12 with the following patch you suggested on top: >>> >>> --- a/drivers/acpi/ec.c >>> +++ b/drivers/acpi/ec.c >>> @@ -1788,7 +1788,7 @@ void __init acpi_ec_dsdt_probe(void) >>> * At this point, the GPE is not fully initialized, so do not to >>> * handle the events. >>> */ >>> - ret = acpi_ec_setup(ec, NULL, true); >>> + ret = acpi_ec_setup(ec, NULL, false); >>> if (ret) { >>> acpi_ec_free(ec); >>> return; >>> >>> Unfortunately this does not help. dmesg shows some EC _REG errors, which >>> are now (with the above diff applied) shown just before the >>> "Boot DSDT EC initialization complete" message, which shows that _REG now >>> runs from acpi_ec_add() rather then before: >>> >>> [ 1.007566] ACPI BIOS Error (bug): Could not resolve symbol [\_TZ.FN00._OFF], AE_NOT_FOUND (20230628/psargs-330) >>> [ 1.007576] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC.EREG due to previous error (AE_NOT_FOUND) (20230628/psparse-52 >>> [ 1.007580] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC._REG due to previous error (AE_NOT_FOUND) (20230628/psparse-52 >>> [ 1.007639] ACPI: EC: interrupt unblocked >>> [ 1.007640] ACPI: EC: event unblocked >>> [ 1.007675] ACPI: EC: EC_CMD/EC_SC=0x66, EC_DATA=0x62 >>> [ 1.007676] ACPI: EC: GPE=0x6e >>> [ 1.007677] ACPI: \_SB_.PC00.LPCB.LGEC: Boot DSDT EC initialization complete >>> [ 1.007679] ACPI: \_SB_.PC00.LPCB.LGEC: EC: Used to handle transactions and events >>> >>> Note that the errors are from calling _REG on \_SB.PC00.LPCB.H_EC, where as the actual >>> EC (and the only acpi_device on which _REG would get called for the EC Opregion before) is: >>> \_SB_.PC00.LPCB.LGEC. >>> >>> Looking at the DSDT it seems that the H_EC is not used and is leftover from a copy/paste >>> from some reference design DSDT. Its _REG however does write to the EC before hitting the error >>> and I think that that write may be causing the issue... >>> >>> The H_EC device does have an _STA method and looking closer the troublesome EREG method is >>> also called from _INI. So I guess that _STA is returning 0 causing _INI to not run and >>> that is the reason why we are not seeing the same EREG errors with kernel 6.9.6 where _REG is >>> only called for the EC opregion on \_SB_.PC00.LPCB.LGEC and not for the entire ACPI device >>> hierarchy as is done with >= 6.9.7 . >>> >>> Maybe we should only call _REG for the EC opregion on present devices (and devices without >>> a _STA)? >>> >>> Also note that both LGEC and H_EC use the same cmd + data ports. >>> >>> I'll go and ask the reporter to retrieve the status of both LGEC and H_EC and then see >>> from there. >> >> The reporter has confirmed that of the 2 EC devices ( H_EC / LGEC ) only LGEC returns >> a _STA of non 0: >> >>> Here it is, with kernel 6.9.6: >>> >>> $ cat /sys/bus/acpi/devices/PNP0C09\:00/path >>> \_SB_.PC00.LPCB.H_EC >>> $ cat /sys/bus/acpi/devices/PNP0C09\:00/status >>> 0 >>> $ cat /sys/bus/acpi/devices/PNP0C09\:01/path >>> \_SB_.PC00.LPCB.LGEC >>> $ cat /sys/bus/acpi/devices/PNP0C09\:01/status >>> 15 >> >> And taking a second look at the other bug: >> https://bugzilla.redhat.com/show_bug.cgi?id=2298938 >> >> That one also has 2 EC ACPI devices and the errors come from calling _REG on the one >> which is not picked as the boot_ec : >> >> jul 19 17:33:41 kernel: ACPI: EC: EC started >> jul 19 17:33:41 kernel: ACPI: EC: interrupt blocked >> jul 19 17:33:41 kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.H_EC.ECMD due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) >> jul 19 17:33:41 kernel: ACPI Error: Aborting method \_TZ.FNCL due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) >> jul 19 17:33:41 kernel: ACPI Error: Aborting method \_TZ.FN00._OFF due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) >> jul 19 17:33:41 kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.H_EC._REG due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) >> jul 19 17:33:41 kernel: ACPI: EC: EC_CMD/EC_SC=0x66, EC_DATA=0x62 >> jul 19 17:33:41 kernel: ACPI: \_SB_.PCI0.LPCB.EC0_: Boot DSDT EC used to handle transactions >> >> Note the H_EC vs EC0_ in the errors vs the "Boot DSDT EC used to >> handle transactions" message. >> >> So the issue in both cases seems to be calling _REG on an unused EC acpi_device. >> Not sure how to best fix this though ... > > I have created an experimental acpi-ec-fixes branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=acpi-ec-fixes > > for this which illustrates my idea (untested so far). Thank you. I believe that the approach taken here is good and I also like the code (of the current version) so you may add my: Reviewed-by: Hans de Goede <hdegoede@xxxxxxxxxx> to all 3 patches. I have started a test Fedora 40 6.10.4 kernel build with the patches from the acpi-ec-fixes branch added: https://koji.fedoraproject.org/koji/taskinfo?taskID=121834209 and I have asked the reporters of both bugs: 2298938 - Multiple ACPI errors resulting in incorrect thermal readings and misleading CPU 2302253 - ACPI: EC: LG gram laptop brightness keys stop working with kernel >= 6.9.7 to test this. I expect a good turn around time from the reporter of bug 2302253. So far the reporter of 2298938 is not really responsive (holidays?). Regards, Hans