On Thu, Mar 16, 2023, Joao Martins wrote: > On KVM GSI routing table updates, specially those where they have vIOMMUs > with interrupt remapping enabled (to boot >255vcpus setups without relying > on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs > with a new VCPU affinity. > > On AMD with AVIC enabled, the new vcpu affinity info is updated via: > avic_pi_update_irte() > irq_set_vcpu_affinity() > amd_ir_set_vcpu_affinity() > amd_iommu_{de}activate_guest_mode() > > Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag > contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM > (via GALog) when interrupt cannot be delivered due to vCPU is in > blocking state. > > The issue is that amd_iommu_activate_guest_mode() will essentially > only change IRTE fields on transitions from non-guest-mode to guest-mode > and otherwise returns *with no changes to IRTE* on already configured > guest-mode interrupts. To the guest this means that the VF interrupts > remain affined to the first vCPU they were first configured, and guest > will be unable to either VF interrupts and receive messages like this > from spuruious interrupts (e.g. from waking the wrong vCPU in GALog): > > [ 167.759472] __common_interrupt: 3.34 No irq handler for vector > [ 230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid > 3122): Recovered 1 EQEs on cmd_eq > [ 230.681799] mlx5_core 0000:00:02.0: > wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400) > recovered after timeout > [ 230.683266] __common_interrupt: 3.34 No irq handler for vector > > Given the fact that amd_ir_set_vcpu_affinity() uses > amd_iommu_activate_guest_mode() underneath it essentially means that VCPU > affinity changes of IRTEs are nops. Fix it by dropping the check for > guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to > amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change > underlying DestID on the host, the VFIO IRQ handler will still be able to > poke at the right guest-vCPU. Is there any harm in giving deactivate the same treatement? If the worst case scenario is a few wasted cycles, having symmetric flows and eliminating benign bugs seems like a worthwhile tradeoff (assuming this is indeed a relatively slow path like I think it is). > Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") > Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> > Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx> > --- > drivers/iommu/amd/iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > index 5a505ba5467e..bf3ebc9d6cde 100644 > --- a/drivers/iommu/amd/iommu.c > +++ b/drivers/iommu/amd/iommu.c > @@ -3485,7 +3485,7 @@ int amd_iommu_activate_guest_mode(void *data) Any chance you (or anyone) would want to create a follow-up series to rename and/or rework these flows to make it more obvious that the helpers handle updates as well as transitions between "guest mode" and "host mode"? E.g. I can see KVM getting clever and skipping the "activation" when KVM knows AVIC is already active (though I can't tell for certain whether or not that would actually be problematic). > u64 valid; > > if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) || > - !entry || entry->lo.fields_vapic.guest_mode) > + !entry) This can easily fit on the previous line. if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) || !entry) return 0;