Re: [PATCH v2] x86/sev: Fix host kdump support for SNP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Sean,

On 9/4/2024 5:23 PM, Sean Christopherson wrote:
> On Wed, Sep 04, 2024, Ashish Kalra wrote:
>> On 9/4/2024 2:54 PM, Michael Roth wrote:
>>>   - Sean inquired about making the target kdump kernel more agnostic to
>>>     whether or not SNP_SHUTDOWN was done properly, since that might
>>>     allow for capturing state even for edge cases where we can't go
>>>     through the normal cleanup path. I mentioned we'd tried this to some
>>>     degree but hit issues with the IOMMU, and when working around that
>>>     there was another issue but I don't quite recall the specifics.
>>>     Can you post a quick recap of what the issues are with that approach
>>>     so we can determine whether or not this is still an option?
>>
>> Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the
>> IOMMUs into an IRQ remapping configuration causing the crash in
>> io_apic.c::check_timer().
>>
>> It looks like in this case, we enable IRQ remapping configuration *earlier*
>> than when it needs to be enabled and which causes the panic as indicated:
>>
>> EMERGENCY [    1.376701] Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC
>
> I assume the problem is that IOMMU setup fails in the kdump kernel, not that it
> does the setup earlier.  That's that part I want to understand.

Here is a deeper understanding of this issue:

It looks like this is happening: when we do SNP_SHUTDOWN without IOMMU_SNP_SHUTDOWN during panic, kdump boot runs with iommu snp 
enforcement still enabled and IOMMU completion wait buffers (cwb) still locked and exclusivity still setup on those, and then in 
kdump boot, we allocate new iommu completion wait buffers and try to use them, but we get a iommu command completion wait time-out,
due to the locked in (prev) completion wait buffers, the newly allocated completion wait buffers are not getting used for iommu 
command execution and completion indication :

[    1.711588] AMD-Vi: early_amd_iommu_init: irq remaping enabled
[    1.718972] AMD-Vi: in early_enable_iommus
[    1.723543] AMD-Vi: Translation is already enabled - trying to copy translation structures
[    1.733333] AMD-Vi: Copied DEV table from previous kernel.
[    1.739566] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240903-snp-host-f2a41ff576cc+ #78
[    1.750920] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM100AB 10/17/2022
[    1.759950] Call Trace:
[    1.762677]  <TASK>
[    1.765018]  dump_stack_lvl+0x70/0x90
[    1.769109]  dump_stack+0x14/0x20
[    1.772809]  iommu_completion_wait.part.0.isra.0+0x38/0x140
[    1.779035]  amd_iommu_flush_all_caches+0xa3/0x240
[    1.784383]  ? memcpy_toio+0x25/0xc0
[    1.788372]  early_enable_iommus+0x151/0x880
[    1.793140]  state_next+0xe67/0x22b0
[    1.797130]  ? __raw_callee_save___native_queued_spin_unlock+0x19/0x30
[    1.804421]  amd_iommu_enable+0x24/0x60
[    1.808702]  irq_remapping_enable+0x1f/0x50
[    1.813371]  enable_IR_x2apic+0x155/0x260
[    1.817848]  x86_64_probe_apic+0x13/0x70
[    1.822226]  apic_intr_mode_init+0x39/0xf0
[    1.826799]  x86_late_time_init+0x28/0x40
[    1.831266]  start_kernel+0x6ad/0xb50
[    1.835436]  x86_64_start_reservations+0x1c/0x30
[    1.840591]  x86_64_start_kernel+0xbf/0x110
[    1.845256]  ? setup_ghcb+0x12/0x130
[    1.849247]  common_startup_64+0x13e/0x141
[    1.853821]  </TASK>
[    2.077901] AMD-Vi: Completion-Wait loop timed out
...

And because of this the iommu command, in this case which is for enabling irq remapping does not succeed and that eventually causes 
timer to fail without irq remapping support enabled.

Once IOMMU SNP support is enabled, to enforce RMP enforcement the IOMMU completion wait buffers are setup as read-only and 
exclusivity set on these and additionally the IOMMU registers used to mark the exclusivity on the store addresses associated with 
these CWB is also locked. This enforcement of SNP in the IOMMU is only disabled with the IOMMU_SNP_SHUTDOWN parameter with 
SNP_SHUTDOWN_EX command.


[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux