Re: [RFC 02/33] KVM: x86: Introduce KVM_CAP_APIC_ID_GROUPS

Nicolas Saenz Julienne <nsaenz@xxxxxxxxxx> · Fri, 10 Nov 2023 18:46:44 +0000

On Wed Nov 8, 2023 at 5:47 PM UTC, Sean Christopherson wrote:
> On Wed, Nov 08, 2023, Nicolas Saenz Julienne wrote:
> > From: Anel Orazgaliyeva <anelkz@xxxxxxxxx>
> >
> > Introduce KVM_CAP_APIC_ID_GROUPS, this capability segments the VM's APIC
> > ids into two. The lower bits, the physical APIC id, represent the part
> > that's exposed to the guest. The higher bits, which are private to KVM,
> > groups APICs together. APICs in different groups are isolated from each
> > other, and IPIs can only be directed at APICs that share the same group
> > as its source. Furthermore, groups are only relevant to IPIs, anything
> > incoming from outside the local APIC complex: from the IOAPIC, MSIs, or
> > PV-IPIs is targeted at the default APIC group, group 0.
> >
> > When routing IPIs with physical destinations, KVM will OR the source's
> > vCPU APIC group with the ICR's destination ID and use that to resolve
> > the target lAPIC.
>
> Is all of the above arbitrary KVM behavior or defined by the TLFS?

All this matches VSM's expectations to how interrupts are to be handled.
But APIC groups is a concept we created with the aim of generalizing the
behaviour as much as possible.

> > The APIC physical map is also made group aware in
> > order to speed up this process. For the sake of simplicity, the logical
> > map is not built while KVM_CAP_APIC_ID_GROUPS is in use and we defer IPI
> > routing to the slower per-vCPU scan method.
>
> Why?  I mean, I kinda sorta understand what it does for VSM, but it's not at all
> obvious why this information needs to be shoved into the APIC IDs.  E.g. why not
> have an explicit group_id and then maintain separate optimization maps for each?

There are three tricks to APIC groups. One is IPI routing: for ex. the
ICR phyical destination is mixed with the source vCPU's APIC group to
find the destination vCPU. Another is presenting a coherent APIC ID
across VTLs; switching VTLs shouldn't change the guest's view of the
APIC ID. And ultimately keeps track of the vCPU's VTL level. I don't wee
why we couldn't decouple them, as long as we keep filtering the APIC ID
before it reaches the guest.

> > This capability serves as a building block to implement virtualisation
> > based security features like Hyper-V's Virtual Secure Mode (VSM). VSM
> > introduces a para-virtualised switch that allows for guest CPUs to jump
> > into a different execution context, this switches into a different CPU
> > state, lAPIC state, and memory protections. We model this in KVM by
>
> Who is "we"?  As a general rule, avoid pronouns.  "we" and "us" in particular
> should never show up in a changelog.  I genuinely don't know if "we" means
> userspace or KVM, and the distinction matters because it clarifies whether or
> not KVM is actively involved in the modeling versus KVM being little more than a
> dumb pipe to provide the plumbing.

Sorry, I've been actively trying to avoid pronouns as you already
mentioned it on a previous review. This one made it through the cracks.

> > using distinct kvm_vcpus for each context.
> >
> > Moreover, execution contexts are hierarchical and its APICs are meant to
> > remain functional even when the context isn't 'scheduled in'.
>
> Please explain the relationship and rules of execution contexts.  E.g. are
> execution contexts the same thing as VTLs?  Do all "real" vCPUs belong to every
> execution context?  If so, is that a requirement?

I left a note about this in my reply to your questions in the cover
letter.

> > For example, we have to keep track of
> > timers' expirations, and interrupt execution of lesser priority contexts
> > when relevant. Hence the need to alias physical APIC ids, while keeping
> > the ability to target specific execution contexts.
> >
> > Signed-off-by: Anel Orazgaliyeva <anelkz@xxxxxxxxx>
> > Co-developed-by: Nicolas Saenz Julienne <nsaenz@xxxxxxxxxx>
> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@xxxxxxxxxx>
> > ---
>
>
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index e1021517cf04..542bd208e52b 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -97,6 +97,8 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> >  void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
> >  void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
> >  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
> > +int kvm_vm_ioctl_set_apic_id_groups(struct kvm *kvm,
> > +                                 struct kvm_apic_id_groups *groups);
> >  void kvm_recalculate_apic_map(struct kvm *kvm);
> >  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
> >  void kvm_apic_after_set_mcg_cap(struct kvm_vcpu *vcpu);
> > @@ -277,4 +279,35 @@ static inline u8 kvm_xapic_id(struct kvm_lapic *apic)
> >       return kvm_lapic_get_reg(apic, APIC_ID) >> 24;
> >  }
> >
> > +static inline u32 kvm_apic_id(struct kvm_vcpu *vcpu)
> > +{
> > +     return vcpu->vcpu_id & ~vcpu->kvm->arch.apic_id_group_mask;
>
> This is *extremely* misleading.  KVM forces the x2APIC ID to match vcpu_id, but
> in xAPIC mode the ID is fully writable.

Yes, although I'm under the impression that no sane OS will do so. We
can decouple the group from the APIC ID, but it still needs to be masked
before presenting it to the guest. So I guess we'll have to deal with
the eventuality of apic id writing one way or anoter (a warn only if VSM
is enabled?).

If we decide the APIC group uAPI is not worth it, we can always create
an ad-hoc VSM one that explicitly sets the kvm_vcpu's VTL. Then route
the VTL internally into the APIC which can use still groups (or a
similar concept).

Nicolas