Hi,
On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
James Bottomley @ 2020-10-13 08:24 MST:
On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
Jarkko Sakkinen @ 2020-10-12 18:17 MST:
On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
The current state of the TIS TPM is that interrupts have been
globally disabled by various changes. The problems we got
reported the last time they were enabled was interrupt
storms. With my own TIS TPM, I've found that this is caused
because my TPM doesn't do legacy cycles, The TIS spec (chapter
6.1 "Locality Usage Per Register") requires any TIS TPM without
legacy cycles not to act on any write to an interrupt register
unless the locality is enabled. This means if an interrupt fires
after we relinquish the locality, the TPM_EOI in the interrupt
routine is ineffective meaning the same interrupt triggers over
and over again. This problem also means we can have trouble
setting up interrupts on TIS TPMs because the current init
code does the setup before the locality is claimed for the first
time.
James
You should consider expanding the audience.
Well, most people interested in testing this sort of thing are already
on the integrity list.
Jerry, once you have some bandwidth (no rush, does not land before
rc2), it would be great that if you could try this. I'm emphasizing
this just because of the intersection. I think it would also make
senset to get tested-by from Nayna.
I will run some tests on some other systems I have access to. As
noted in the other email I did a quick test with a t490s with an
older bios that exhibits the problem originally reported when
Stefan's patch enabled interrupts.
Well, it means there's still some other problem. I was hoping that
because the rainbow pass system originally exhibited the same symptoms
(interrupt storm) fixing it would also fix the t490 and the ineffective
EOI bug looked like a great candidate for being the root cause.
Adding Hans to the list.
IIUC in the t490s case the problem lies with the hardware itself. Hans,
is that correct?
More or less. AFAIK / have been told by Lenovo it is an issue with the
configuration of the inerrupt-type of the GPIO pin used for the IRQ,
which is a firmware issue which could be fixed by a BIOS update
(the pin is setup as a direct-irq pin for the APIC, so the OS has no
control of the IRQ type since with APIC irqs this is all supposed to
be setup properly before hand).
But it is a model specific issue, if we denylist IRQ usage on this
Lenovo model (and probably a few others) then we should be able to
restore the IRQ code to normal functionality for all other device
models which declare an IRQ in their resource tables.
Regards,
Hans