Martin Wilck <martin.wilck at fujitsu-siemens.com> writes: > Hello Eric, > >> How bad is it if you just run with irqpoll in the kdump kernel? >> If running with irqpoll is usable that is probably preferable >> to putting in a hardware work around we can survive without. > > Yes, I tried that. No effect. Ok. Later in the thread it sounds like you have retried this and irqpoll is working now. >> Have you done any looking at moving where the kernel initalizes >> io_apics? One of the todo items on the path is to leave >> io_apic mode enabled and just startup the kernel in io_apic >> mode. > > I have tried to recover from the "IRR set" situation in several ways by > changing setup_IO_APIC_irq(). But I haven't found a way to recover from > this situation once disable_IO_APIC() had been called. Yes. The long term goal is to remove the need for calling disable_IO_APIC(). Because that makes the code simpler etc. Once we get the kernel to the point where it can start in ioapic mode (and not in i8259 mode) we can remove the disabled code from the kexec on panic path. > I concluded thatthe sequence of events > "send INT message - never receive EOI - disable IO-APIC pin" > messes up the IO-APIC (at least this specific one in the > PCIEx-PCI bridge of the ICH7). It is quite possible. I have observed a lot of obscure bugs in the corner cases of the state machines, although it is possible this is correct behavior and it is just specific to level triggered interrupts which are almost exclusively not on the first ioapic in a system like you describe. I suspect the issue is that we never send the EOI message from the local apic, and so it waits forever. Or that we have reprogrammed the vectors by the time we send the EOI message so that the EOI and the ioapic don't agree on the vector number when the EOI message is sent. Grumble silly level triggered interrupts grumble. Eric