Hello Sean, On 9/4/2024 5:23 PM, Sean Christopherson wrote: >> On Wed, Sep 04, 2024, Ashish Kalra wrote: >>> On 9/4/2024 2:54 PM, Michael Roth wrote: >>>> - Sean inquired about making the target kdump kernel more agnostic to >>>> whether or not SNP_SHUTDOWN was done properly, since that might >>>> allow for capturing state even for edge cases where we can't go >>>> through the normal cleanup path. I mentioned we'd tried this to some >>>> degree but hit issues with the IOMMU, and when working around that >>>> there was another issue but I don't quite recall the specifics. >>>> Can you post a quick recap of what the issues are with that approach >>>> so we can determine whether or not this is still an option? >>> >>> Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the >>> IOMMUs into an IRQ remapping configuration causing the crash in >>> io_apic.c::check_timer(). >>> >>> It looks like in this case, we enable IRQ remapping configuration *earlier* >>> than when it needs to be enabled and which causes the panic as indicated: >>> >>> EMERGENCY [ 1.376701] Kernel panic - not syncing: timer doesn't work >>> through Interrupt-remapped IO-APIC >> >> I assume the problem is that IOMMU setup fails in the kdump kernel, not that it >> does the setup earlier. That's that part I want to understand. >Here is a deeper understanding of this issue: >It looks like this is happening: when we do SNP_SHUTDOWN without IOMMU_SNP_SHUTDOWN during panic, kdump boot runs with iommu snp >enforcement still enabled and IOMMU completion wait buffers (cwb) still locked and exclusivity still setup on those, and then in >kdump boot, we allocate new iommu completion wait buffers and try to use them, but we get a iommu command completion wait time-out, >due to the locked in (prev) completion wait buffers, the newly allocated completion wait buffers are not getting used for iommu >command execution and completion indication : >[ 1.711588] AMD-Vi: early_amd_iommu_init: irq remaping enabled >[ 1.718972] AMD-Vi: in early_enable_iommus >[ 1.723543] AMD-Vi: Translation is already enabled - trying to copy translation structures >[ 1.733333] AMD-Vi: Copied DEV table from previous kernel. >[ 1.739566] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240903-snp-host-f2a41ff576cc+ #78 >[ 1.750920] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM100AB 10/17/2022 >[ 1.759950] Call Trace: >[ 1.762677] <TASK> >[ 1.765018] dump_stack_lvl+0x70/0x90 >[ 1.769109] dump_stack+0x14/0x20 >[ 1.772809] iommu_completion_wait.part.0.isra.0+0x38/0x140 >[ 1.779035] amd_iommu_flush_all_caches+0xa3/0x240 >[ 1.784383] ? memcpy_toio+0x25/0xc0 >[ 1.788372] early_enable_iommus+0x151/0x880 >[ 1.793140] state_next+0xe67/0x22b0 >[ 1.797130] ? __raw_callee_save___native_queued_spin_unlock+0x19/0x30 >[ 1.804421] amd_iommu_enable+0x24/0x60 >[ 1.808702] irq_remapping_enable+0x1f/0x50 >[ 1.813371] enable_IR_x2apic+0x155/0x260 >[ 1.817848] x86_64_probe_apic+0x13/0x70 >[ 1.822226] apic_intr_mode_init+0x39/0xf0 >[ 1.826799] x86_late_time_init+0x28/0x40 >[ 1.831266] start_kernel+0x6ad/0xb50 >[ 1.835436] x86_64_start_reservations+0x1c/0x30 >[ 1.840591] x86_64_start_kernel+0xbf/0x110 >[ 1.845256] ? setup_ghcb+0x12/0x130 >[ 1.849247] common_startup_64+0x13e/0x141 >[ 1.853821] </TASK> >[ 2.077901] AMD-Vi: Completion-Wait loop timed out >... >And because of this the iommu command, in this case which is for enabling irq remapping does not succeed and that eventually causes >timer to fail without irq remapping support enabled. >Once IOMMU SNP support is enabled, to enforce RMP enforcement the IOMMU completion wait buffers are setup as read-only and >exclusivity set on these and additionally the IOMMU registers used to mark the exclusivity on the store addresses associated with >these CWB is also locked. This enforcement of SNP in the IOMMU is only disabled with the IOMMU_SNP_SHUTDOWN parameter with >SNP_SHUTDOWN_EX command. >From the AMD IOMMU specifications: >2.12.2.2 SEV-SNP COMPLETION_WAIT Store Restrictions On systems that are SNP-enabled, the store address associated with any host >COMPLETION_WAIT command (s=1) is restricted. The Store Address must fall within the address range specified by the Completion Store >Base and Completion Store Limit registers. When the system is SNP-enabled, the memory within this range will be marked in the RMP >using a special immutable state by the PSP. This memory region will be readable by the CPU but not writable. >2.12.2.3 SEV-SNP Exclusion Range Restrictions The exclusion range feature is not supported on systems that are SNP-enabled. >Additionally, the Exclusion Base and Exclusion Range Limit registers are re-purposed to act as the Completion Store Base and Limit >registers. >Therefore, we need to disable IOMMU SNP enforcement with SNP_SHUTDOWN_EX command before the kdump kernel starts booting as we can't >setup IOMMU CWB again in kdump as SEV-SNP exclusion base and range limit registers are locked as IOMMU SNP support is still enabled. >I tried to use the previous kernel's CWB (cmd_sem) as below: >static int __init alloc_cwwb_sem(struct amd_iommu *iommu) >{ > if (!is_kdump_kernel()) > iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1); > else { > if (check_feature(FEATURE_SNP)) { > u64 cwwb_sem_paddr; > > cwwb_sem_paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET); > iommu->cmd_sem = iommu_phys_to_virt(cwwb_sem_paddr); > return iommu->cmd_sem ? 0 : -ENOMEM; > } > } > > return iommu->cmd_sem ? 0 : -ENOMEM; >} >I tried this, but this fails as i believe the kdump kernel will not have these previous kernel's allocated IOMMU CWB in the kernel >direct map : >[ 1.708959] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.714327] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100805000, cmd_sem_vaddr 0xffff9f5340805000 >[ 1.726309] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.731676] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050051000, cmd_sem_vaddr 0xffff9f6290051000 >[ 1.743742] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.749109] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050052000, cmd_sem_vaddr 0xffff9f6290052000 >[ 1.761177] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.766542] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100808000, cmd_sem_vaddr 0xffff9f5340808000 >[ 1.778509] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.783877] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050053000, cmd_sem_vaddr 0xffff9f6290053000 >[ 1.795942] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.801300] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100809000, cmd_sem_vaddr 0xffff9f5340809000 >[ 1.813268] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.818636] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050054000, cmd_sem_vaddr 0xffff9f6290054000 >[ 1.830701] AMD-Vi: in alloc_cwwb_sem kdump kernel >[ 1.836069] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x10080a000, cmd_sem_vaddr 0xffff9f534080a000 >[ 1.848039] AMD-Vi: early_amd_iommu_init: irq remaping enabled >[ 1.855431] AMD-Vi: in early_enable_iommus >[ 1.860032] AMD-Vi: Translation is already enabled - trying to copy translation structures >[ 1.869812] AMD-Vi: Copied DEV table from previous kernel. >[ 1.875958] AMD-Vi: in build_completion_wait, paddr = 0x100805000 >[ 1.882766] BUG: unable to handle page fault for address: ffff9f5340805000 >[ 1.890441] #PF: supervisor read access in kernel mode >[ 1.896177] #PF: error_code(0x0000) - not-present page >.... >I think that memremap(..,..,MEMREMAP_WB) will also fail for the same reason as memremap(.., MEMREMAP_WB) for the RAM region will >again use the kernel directmap. To follow up on this: I am able to use memremap() to map the previous kernel's allocated CWB buffers and try to reuse the same CWB buffers in the kdump kernel, obviously, memremap() does not return a direct pointer to kernel directmap as the previous kernel's CWB buffers will be in a RAM address which is not directly mapped into kdump kernel's directmap. And these memremap() mappings seem to be correct, because if i do a memset(0) on these, i get a RMP #PF violation due to these buffers being setup as RO in the RMP table, so that means that memremap() seems to have done the mapping correctly. I am getting inconsistent IOMMU command completion wait timeout's with these reused CWB buffers (which are used as semaphores to indicate IOMMU command completions) and i am still debugging those issues. Thanks, Ashish