On Mon, Oct 26, 2020 at 12:11 AM Pingfan Liu <kernelfans@xxxxxxxxx> wrote: > > On Sun, Oct 25, 2020 at 8:21 PM Oliver O'Halloran <oohall@xxxxxxxxx> wrote: > > > > On Sun, Oct 25, 2020 at 10:22 PM Pingfan Liu <kernelfans@xxxxxxxxx> wrote: > > > > > > On Thu, Oct 22, 2020 at 4:37 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > > > > > > On Thu, Oct 22 2020 at 13:56, Pingfan Liu wrote: > > > > > I hit a irqflood bug on powerpc platform, and two years ago, on a x86 platform. > > > > > When the bug happens, the kernel is totally occupies by irq. Currently, there > > > > > may be nothing or just soft lockup warning showed in console. It is better > > > > > to warn users with irq flood info. > > > > > > > > > > In the kdump case, the kernel can move on by suppressing the irq flood. > > > > > > > > You're curing the symptom not the cause and the cure is just magic and > > > > can't work reliably. > > > Yeah, it is magic. But at least, it is better to printk something and > > > alarm users about what happens. With current code, it may show nothing > > > when system hangs. > > > > > > > > Where is that irq flood originated from and why is none of the > > > > mechanisms we have in place to shut it up working? > > > The bug originates from a driver tpm_i2c_nuvoton, which calls i2c-bus > > > driver (i2c-opal.c). After i2c_opal_send_request(), the bug is > > > triggered. > > > > > > But things are complicated by introducing a firmware layer: Skiboot. > > > This software layer hides the detail of manipulating the hardware from > > > Linux. > > > > > > I guess the software logic can not enter a sane state when kernel crashes. > > > > > > Cc Skiboot and ppc64 community to see whether anyone has idea about it. > > > > What system are you using? > > Here is the info, if not enough, I will get more. > Product Name : OpenPOWER Firmware > Product Version : open-power-SUPERMICRO-P9DSU-V1.16-20180531-imp > Product Extra : op-build-e4b3eb5 > Product Extra : skiboot-v6.0-p1da203b > Product Extra : hostboot-f911e5c-pda8239f > Product Extra : occ-77bb5e6-p623d1cd > Product Extra : linux-4.16.7-openpower2-pbc45895 > Product Extra : petitboot-v1.7.1-pf773c0d > Product Extra : machine-xml-218a77a Unfortunately I don't have a schematic for that one. > > There's an external interrupt pin which is supposed to be wired to the > > TPM. I think we bounce that interrupt to FW by default since the > > external interrupt is sometimes used for other system-specific > > purposes. Odds are FW doesn't know what to do with it so you > > effectively have an always-on LSI. I fixed a similar bug a while ago > > by having skiboot mask any interrupts it doesn't have a handler for, > > This sounds like the root cause. But here Skiboot should have handler, > otherwise the first kernel can not run smoothly. I don't know why the TPM interrupt is asserted. If the TPM driver is polling for a response it might clear the underlying condition as a side effect of it's normal operation. > Do you have any idea about an unexpected re-initialization introducing > an unsane stage? No idea, but those TPMs have a history of bricking themselves if you do anything slightly odd to them. It wouldn't surprise me if the re-probe can cause issues. > Thanks, > Pingfan _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec