Re: ACPI IRQ storm with 6.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b)

On 17. 08. 24, 19:57, Petr Valenta wrote:


Dne 16. 08. 24 v 20:29 Rafael J. Wysocki napsal(a):
On Wed, Aug 14, 2024 at 8:48 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote:

On 14. 08. 24, 7:22, Jiri Slaby wrote:
Hi,

one openSUSE's user reported that with 6.10, he sees one CPU under an
IRQ storm from ACPI (sci_interrupt):
     9:   20220768          ...  IR-IO-APIC    9-fasteoi   acpi

At:
https://bugzilla.suse.com/show_bug.cgi?id=1229085

6.9 was OK.

With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there is a
repeated load of:
evgpe-0673 ev_detect_gpe         : Read registers for GPE 6D:
Status=20, Enable=00, RunEnable=4A, WakeEnable=00

0x6d seems to count excessively (10 snapshots every 1 second):
/sys/firmware/acpi/interrupts/gpe6D:   82066  EN STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:   86536  EN STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:   90990     STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:   95468  EN STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  100282  EN STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  105187     STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  110014     STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  114852     STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  119682     STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  124194     STS enabled unmasked /sys/firmware/acpi/interrupts/gpe6D:  128641  EN STS enabled unmasked

acpidump:
https://bugzilla.suse.com/attachment.cgi?id=876677

DSDT:
https://bugzilla.suse.com/attachment.cgi?id=876678

Any ideas?

GPE 6D is listed in _PRW for some devices, so maybe one of them
continues to trigger wakeup events?


Disabling powertop service (which calls /usr/sbin/powertop --auto-tune) solves problem completely. After some search I have found this is the cause:

# causes IRQ storm on 6.10.x
# kernel 6.9.9 is immune
echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control

$ git log --no-merges --oneline v6.9..v6.10 drivers/net/ethernet/intel/e1000e/
76a0a3f9cc2f e1000e: fix force smbus during suspend flow
c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems
bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function
6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates
1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu()
b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops
75a3f93b5383 net: intel: implement modern PM ops declarations

The last two play with PM ^^. I cannot immediately see if the issue can be caused by any of those, though.

If there are no ideas, possibly giving revert of both a try?

lspci | grep 1f.6
00:1f.6 Ethernet controller: Intel Corporation Device 550b (rev 20)

journalctl -b | grep 1f.6
srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: [8086:550b] type 00 class 0x020000 conventional PCI endpoint srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: BAR 0 [mem 0x9c300000-0x9c31ffff] srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: PME# supported from D0 D3hot D3cold
srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: Adding to iommu group 12
srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) fc:5c:ee:b0:13:74 srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: MAC: 16, PHY: 12, PBA No: FFFFFF-0FF srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0 srp 17 19:44:24 e14 ModemManager[1434]: <info>  [base-manager] couldn't check support for device '/sys/devices/pci0000:00/0000:00:1f.6': not supported by any plugin



You can ask the reporter to mask that GPE via "echo mask >
/sys/firmware/acpi/interrupts/gpe6D" and see if the storm goes away
then.

The only ACPI core issue introduced between 6.9 and 6.10 I'm aware of
is the one addressed by this series

https://lore.kernel.org/linux-acpi/22385894.EfDdHjke4D@xxxxxxxxxxxxx/

but this is about the EC and the problem here doesn't appear to be
EC-related.  It may be worth trying anyway, though.


--
js
suse labs





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux