On Wed, Sep 04, 2024, Ashish Kalra wrote: > On 9/4/2024 2:54 PM, Michael Roth wrote: > > - Sean inquired about making the target kdump kernel more agnostic to > > whether or not SNP_SHUTDOWN was done properly, since that might > > allow for capturing state even for edge cases where we can't go > > through the normal cleanup path. I mentioned we'd tried this to some > > degree but hit issues with the IOMMU, and when working around that > > there was another issue but I don't quite recall the specifics. > > Can you post a quick recap of what the issues are with that approach > > so we can determine whether or not this is still an option? > > Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the > IOMMUs into an IRQ remapping configuration causing the crash in > io_apic.c::check_timer(). > > It looks like in this case, we enable IRQ remapping configuration *earlier* > than when it needs to be enabled and which causes the panic as indicated: > > EMERGENCY [ 1.376701] Kernel panic - not syncing: timer doesn't work > through Interrupt-remapped IO-APIC I assume the problem is that IOMMU setup fails in the kdump kernel, not that it does the setup earlier. That's that part I want to understand. Based on the SNP ABI: The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is read only to the hypervisor and cannot be assigned to guests. and commit f366a8dac1b8 ("iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown"), my understanding is that the pages used for the IOMMU logs are forced to read-only for the IOMMU, and so attempting to access those pages in the kdump kernel will result in an RMP #PF. That's quite unfortunate, as it means my idea of eating RMP #PFs doesn't really work, because that idea is based on the assumption that only guest private memory would generate unexpected RMP #PFs :-( > Next, we tried with amd_iommu=off, with that we don't get the irq remapping > panic during crashkernel boot, but boot still hangs before starting kdump > tools. > > So eventually we discovered that irqremapping is required for x2apic and with > amd_iommu=off we don't enable irqremapping at all. Yeah, that makes sense, as does failing to boot if the system isn't configured properly, i.e. can't send interrupts to all CPUs.