Re: [PATCH v1] iommu/amd: Don't block updates to GATag if guest mode is already on

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/02/2023 11:42, Suthikulpanit, Suravee wrote:
> On 2/8/2023 8:19 PM, Joao Martins wrote:
>> On KVM GSI routing table updates, specially those where they have vIOMMUs
>> with interrupt remapping enabled (e.g. to boot >255vcpus guests without
>> relying on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF
>> MSIs with new VCPU affinities.
>>
>> On AMD this translates to calls to amd_ir_set_vcpu_affinity() and
>> eventually to amd_iommu_{de}activate_guest_mode() with a new GATag
>> outlining the VM ID and (new) VCPU ID. On vCPU blocking and unblocking
>> paths it disables AVIC, and rely on GALog to convey the wakeups to any
>> sleeping vCPUs. KVM will store a list of GA-mode IR entries to each
>> running/blocked vCPU. So any vCPU Affinity update to a VF interrupt happen
>> via KVM, and it will change already-configured-guest-mode IRTEs with a new
>> GATag.
> 
> Could we simplify this paragraph to:
> 
> On AMD with AVIC enabled, the new vcpu affinity info is updated via:
>     avic_pi_update_irte()
>         irq_set_vcpu_affinity()
>             amd_ir_set_vcpu_affinity()
>                 amd_iommu_{de}activate_guest_mode()
> 
> where the IRTE[GATag] is updated with the new vcpu affinity. The GATag contains
> VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM (via GALog) when
> interrupt cannot be delivered due to vCPU is in blocking state.
> 

Will change for v2.

>> The issue is that amd_iommu_activate_guest_mode() will essentially only
>> change IRTE fields on transitions from non-guest-mode to guest-mode and
>> otherwise returns *with no changes to IRTE* on already configured
>> guest-mode interrupts. To the guest this means that the VF interrupts
>> remain affined to the first vCPU these were first configured, and guest
>> will be unable to either VF interrupts and receive messages like this from
>> spurious interrupts (e.g. from waking the wrong vCPU in GALog):
>>
>> [  167.759472] __common_interrupt: 3.34 No irq handler for vector
>> [  230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid
>> 3122): Recovered 1 EQEs on cmd_eq
>> [  230.681799] mlx5_core 0000:00:02.0:
>> wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400)
>> recovered after timeout
>> [  230.683266] __common_interrupt: 3.34 No irq handler for vector
>>
>> Given that amd_ir_set_vcpu_affinity() uses amd_iommu_activate_guest_mode()
>> underneath it essentially means that VCPU affinity changes of IRTEs are
>> nops if it was called once for the IRTE already (on VMENTER). Fix it by
>> dropping the check for guest-mode at amd_iommu_activate_guest_mode().  Same
>> thing is applicable to amd_iommu_deactivate_guest_mode() although, even if
>> the IRTE doesn't change underlying DestID on the host, the VFIO IRQ handler
>> will still be able to poke at the right guest-vCPU.
>>
>> Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation
>> code")
>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> ---
>> Some notes in other related flaws as I looked at this:
>>
>> 1) amd_iommu_deactivate_guest_mode() suffers from the same issue as this patch,
>> but it should only matter for the case where you rely on irqbalance-like
>> daemons balancing VFIO IRQs in the hypervisor. Though, it doesn't translate
>> into guest failures, more like performance "misdirection". Happy to fix it, if
>> folks also deem it as a problem.
>>
>> 2) This patch doesn't attempt at changing semantics around what
>> amd_iommu_activate_guest_mode() has been doing for a long time [since v5.4]
>> (i.e. clear the whole IRTE and then changes its fields). As such when
>> updating the IRTEs the interrupts get isRunning and DestId cleared, thus
>> we rely on the GALog to inject IRQs into vCPUs /until/ the vCPUs block
>> and unblock again (which is when they update the IOMMU affinity), or the
>> AVIC gets momentarily disabled. I have patches that improve this part as a
>> follow-up, but I thought that this patch had value on its own onto fixing
>> what has been broken since v5.4 ... and that it could be easily carried
>> to stable trees.
>>
>> ---
>>   drivers/iommu/amd/iommu.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index cbeaab55c0db..afe1f35a4dd9 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -3476,7 +3476,7 @@ int amd_iommu_activate_guest_mode(void *data)
>>       u64 valid;
>>         if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>> -        !entry || entry->lo.fields_vapic.guest_mode)
>> +        !entry)
>>           return 0;
>>         valid = entry->lo.fields_vapic.valid;
> 
> Apart from the commit message change:
> 
> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
> 
Thanks!

With respect to my notes please ignore item 1 as it's not a problem as far as I
repro-ed. With respect to item 2 in the notes, I have the diff below to address
it under testing, which avoids the inneficiency of marking an IRTE as
isRunning=0 (so avoid relying on the GALog after the IRTE update). This is so
far 2 additional patches. Essentially it boils down to fixing
amd_iommu_update_ga() to avoid @entry and @ref going out of sync and then
switching only the GAVector and GATag while not touching the DestID and
isRunning bit (which are more tied to the running CPU).

@@ -4395,14 +4395,17 @@ int amd_iommu_activate_guest_mode(void *data)
            !entry)
                return 0;

-       valid = entry->lo.fields_vapic.valid;
+       if (!entry->lo.fields_vapic.guest_mode) {
+               valid = entry->lo.fields_vapic.valid;

-       entry->lo.val = 0;
-       entry->hi.val = 0;
+               entry->lo.val = 0;
+               entry->hi.val = 0;
+
+               entry->lo.fields_vapic.valid       = valid;
+               entry->lo.fields_vapic.guest_mode  = 1;
+               entry->lo.fields_vapic.ga_log_intr = 1;
+       }

-       entry->lo.fields_vapic.valid       = valid;
-       entry->lo.fields_vapic.guest_mode  = 1;
-       entry->lo.fields_vapic.ga_log_intr = 1;
        entry->hi.fields.ga_root_ptr       = ir_data->ga_root_ptr;
        entry->hi.fields.vector            = ir_data->ga_vector;
        entry->lo.fields_vapic.ga_tag      = ir_data->ga_tag;
@@ -4579,6 +4582,7 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)

 int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 {
+       int ret;
        unsigned long flags;
        struct amd_iommu *iommu;
        struct irq_remap_table *table;
@@ -4601,15 +4605,18 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)

        raw_spin_lock_irqsave(&table->lock, flags);

-       if (ref->lo.fields_vapic.guest_mode) {
+       if (entry->lo.fields_vapic.guest_mode) {
                if (cpu >= 0) {
-                       ref->lo.fields_vapic.destination =
+                       entry->lo.fields_vapic.destination =
                                                APICID_TO_IRTE_DEST_LO(cpu);
-                       ref->hi.fields.destination =
+                       entry->hi.fields.destination =
                                                APICID_TO_IRTE_DEST_HI(cpu);
                }
-               ref->lo.fields_vapic.is_run = is_run;
-               barrier();
+               entry->lo.fields_vapic.is_run = is_run;
+               ret = cmpxchg_double(&ref->lo.val, &ref->hi.val,
+                                    ref->lo.val, ref->hi.val,
+                                    entry->lo.val, entry->hi.val);
+               WARN_ON(!ret);
        }



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux