On Tue, 01 May 2018 14:25:54 +0100, Bjorn Helgaas wrote: Hi Bjorn, > On Tue, May 01, 2018 at 01:59:20PM +0100, Marc Zyngier wrote: > > On 01/05/18 13:38, Sinan Kaya wrote: > > > +Marc, > > > > > > On 4/30/2018 5:27 PM, Sinan Kaya wrote: > > >> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote: > > >>>> What should we do about this? > > >>>> > > >>>> Since there is an actual HW errata involved, should we quirk this > > >>>> root port and not wait as if remove/shutdown doesn't exist? > > >>> I was hoping to avoid a quirk because AFAIK all Intel parts have this > > >>> issue so it will be an ongoing maintenance issue. I tried to avoid > > >>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute > > >>> timeout from hotplug command start time"). > > >>> > > >>> But we still see the alarming messages, so we should probably add a > > >>> quirk to get rid of those. > > >>> > > >>> But I haven't given up on the idea of getting rid of the > > >>> pciehp_remove() path. I'm not convinced yet that we actually need to > > >>> do anything to shut this device down. I don't like the assumption > > >>> that kexec requires this. The kexec is fundamentally just a branch, > > >>> and anything we do before the branch (i.e., in the old kernel), we > > >>> should also be able to do after the branch (i.e., in the kexec-ed > > >>> kernel). > > >>> > > >> > > >> In my experience with kexec, MSI type edge interrupts are harmless. > > >> You might just see a few unhandled interrupt messages during boot > > >> if something is pending from the first kernel. > > > > Unfortunately, that's not always the case. > > > > A number of GICv3/v4 implementations (a very common interrupt controller > > on ARM servers) cannot be disabled, which means they will keep writing > > to their pending tables long after kexec will have started the new > > kernel. And since we don't track memory allocation across kexec, you > > end-up with significant chances of observing single bit corruption as > > interrupts carry on being delivered. Oh, and you won't actually be able > > to take MSIs because you can't even reprogram the damn thing. > > > > Yes, this can be considered a HW bug. > > > > >> It is the level interrupts that are more concerning. It remains pending > > >> until the interrupt source is cleared. CPU never returns from the > > >> interrupt handler to actually continue booting the second kernel. > > > > > > This makes me wonder why kexec doesn't disable all interrupt sources by > > > itself instead of relying on the drivers shutdown routine. Some drivers > > > don't even have a shutdown callback. Kexec could have done both as another > > > example. Something like. > > > > > > 1. Call shutdown for all drivers if available. > > > 2. Disable all interrupt sources in the interrupt controller > > > 3. Start the new kernel. > > > > See above. Although you can shut off the end-point and to some extent > > mask interrupts before jumping into the payload, it is not always > > possible to go back to a reasonable state where you can take actually MSIs. > > This is exactly the sort of thing it would be nice to collect and > document as part of the background of "why kexec works the way it > does." It certainly helps explain things that are far from obvious if > you don't have the background. I'd certainly be happy to help with it if someone was willing to kickstart such a document. kexec/kdump is a huge bag of "interesting" tricks, and it has driven me mad over the past couple of months (I'm typing this from a laptop that uses kexec as its bootloader, and it is *not fun*). M. -- Jazz is not dead, it just smell funny.