Re: tpm_sis IRQ storm on ThinkStation P360 Tiny

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 11, 2023 at 03:02:13AM +0300, Jarkko Sakkinen wrote:
> On Fri May 5, 2023 at 6:05 PM EEST, Jerry Snitselaar wrote:
> > On Fri, May 05, 2023 at 03:07:31PM +0200, Peter Zijlstra wrote:
> > > Hi,
> > > 
> > > I recently saw my Alderlake NUC spewing on boot:
> > > 
> > > [   13.166514] irq 109: nobody cared (try booting with the "irqpoll" option)
> > > [   13.166614] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.3.0+ #66
> > > [   13.166694] Hardware name: LENOVO 30FBS0B800/330E, BIOS M4GKT18A 04/26/2022
> > > [   13.166779] Call Trace:
> > > [   13.166812]  <IRQ>
> > > [   13.166840]  dump_stack_lvl+0x5b/0x90
> > > [   13.166891]  __report_bad_irq+0x2b/0xc0
> > > [   13.166941]  note_interrupt+0x2ac/0x2f0
> > > [   13.166991]  handle_irq_event+0x6f/0x80
> > > [   13.167041]  handle_fasteoi_irq+0x94/0x1f0
> > > [   13.167093]  __common_interrupt+0x72/0x160
> > > [   13.167112] intel_rapl_common: Found RAPL domain package
> > > [   13.167141]  common_interrupt+0xb8/0xe0
> > > [   13.167200] intel_rapl_common: Found RAPL domain core
> > > [   13.167242]  </IRQ>
> > > [   13.167297] intel_rapl_common: Found RAPL domain uncore
> > > [   13.167322]  <TASK>
> > > [   13.167437]  asm_common_interrupt+0x26/0x40
> > > [   13.167492] RIP: 0010:cpuidle_enter_state+0xff/0x500
> > > [   13.167554] Code: c0 48 0f a3 05 72 34 ad 01 0f 82 fc 02 00 00 31 ff e8 35 b3 52 ff 45 84 ff 0f 85 cc 02 00 00 e8 f7 13 64 ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 eb 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
> > > [   13.167766] RSP: 0018:ffffc900001ebe90 EFLAGS: 00000206
> > > [   13.167843] RAX: 000000000012a8f3 RBX: ffffe8ffff480a00 RCX: 0000000000000000
> > > [   13.167928] RDX: 0000000000000000 RSI: ffffffff8244c6ee RDI: ffffffff8242ca22
> > > [   13.168021] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
> > > [   13.168105] R10: 0000000000000003 R11: 000000000000000a R12: ffffffff83625a80
> > > [   13.168189] R13: 0000000310c8ee5e R14: 0000000000000001 R15: 0000000000000000
> > > [   13.168289]  cpuidle_enter+0x2d/0x40
> > > [   13.168339]  do_idle+0x231/0x290
> > > [   13.168383]  cpu_startup_entry+0x1d/0x20
> > > [   13.168432]  start_secondary+0x11b/0x140
> > > [   13.168482]  secondary_startup_64_no_verify+0xf9/0xfb
> > > [   13.168549]  </TASK>
> > > [   13.168587] handlers:
> > > [   13.168617] [<00000000497ef927>] irq_default_primary_handler threaded [<00000000cf102de1>] tis_int_handler
> > > [   13.168767] Disabling IRQ #109
> > > 
> > > this is apparently:
> > > 
> > > root@alderlake:~# cat /proc/interrupts | grep 109
> > > 109:          0          0          0          0          0     100002          0          0          0          0          0          0          0          0          0          0          0          0          0          0          00          0          0  IR-IO-APIC  109-fasteoi   tpm0
> > > 
> > > the TPM thing, which per same dmesg above is:
> > > 
> > > [   10.948058] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1D, rev-id 54)
> > > 
> > > Booting with tpm_tis.interrupts=0 seems to cure things, and AFAICT the
> > > tpm device actually works -- that is, tpm2 getcap -l and tpm2 pcrread
> > > both give output, I'm presuming this is 'good'. I've never operated a
> > > TPM before.
> > > 
> > > The machine in question is:
> > > 
> > > 	Manufacturer: LENOVO
> > > 	Product Name: 30FBS0B800
> > > 	Version: ThinkStation P360 Tiny
> > > 
> > > So I'm thinking that perhaps Lenovo carried the bug mentioned in commit:
> > > b154ce11ead9 ("tpm_tis: Disable interrupts on ThinkPad T490s") to more
> > > products.
> >
> > Hi Peter,
> >
> > It will poll like it has for years with tpm_tis.interrupts=0 so that
> > should be working as it was prior to 6.3 when interrupts were re-enabled
> > for tpm_tis. Are you seeing this with 6.2 as well? IIRC with that Thinkpad
> > case is when it was first realized that interrupts had accidentally been
> > disabled for tpm_tis at one point by a change.
> >
> > I guess myself or someone else needs to revisit catching this in
> > general when the irq storm happens, and disabling interrupts for
> > tpm_tis. I think last time I was incorporating some feedback from
> > tglx, let my adhd get me distracted with some other issue and never
> > returned to it.
> >
> > The diff below should (compile tested) work for the P360, but
> > tpm_tis.interrupts=0 is a good work-around.
> >
> > Regards,
> > Jerry
> >
> >
> > diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> > index 7af389806643..12dfdbef574d 100644
> > --- a/drivers/char/tpm/tpm_tis.c
> > +++ b/drivers/char/tpm/tpm_tis.c
> > @@ -122,6 +122,14 @@ static const struct dmi_system_id tpm_tis_dmi_table[] = {
> >  			DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
> >  		},
> >  	},
> > +	{
> > +		.callback = tpm_tis_disable_irq,
> > +		.ident = "ThinkStation P360 Tiny",
> > +		.matches = {
> > +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> > +			DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkStation P360 Tiny"),
> > +		},
> > +	},
> >  	{}
> >  };
> 
> OK, this is correct :-) I wonder how my fix candidate passed the
> compiler. Can you send this as a formal patch, which I can then
> ack?
> 
> BR, Jarkko

Patch sent. Is there a way to stack matches in one entry instead of
having complete separate entries for each one? I did a quick scan last
night, but didn't see anything.

I went looking for my craptacular attempt to catch this happening in
general from before.  IIRC one problem was I couldn't just increment a
counter when the handler fired, which is how I ended up coming across
the open coded kstat_irqs in one of the network drivers, and then
thrilled Thomas by trying to export that :). The other issue was doing
the clean up, because disable_interrupts couldn't be called in the
interrupt context. I'm wondering if you can just clear the interrupt
enable when it is caught, and then schedule_work to deal with the
devm_free_irq and other tidying up.

Lino definitely understands the interrupt bits of the tpm better than I
do, so I imagine Lino's attempt will be better than mine was. :)

Regards,
Jerry




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux