Re: [RFC PATCH v4 5/7] KVM: x86: add vCPU scoped toggling for disabled exits

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 20 Jul 2022 19:30:25 +0000

On Wed, Jul 20, 2022, Kechen Lu wrote:
> > > @@ -6036,14 +6045,17 @@ int kvm_vm_ioctl_enable_cap(struct kvm kvm,
> > >                       break;
> > >
> > >               mutex_lock(&kvm->lock);
> > > -             if (kvm->created_vcpus)
> > > -                     goto disable_exits_unlock;
> > > +             if (kvm->created_vcpus) {
> > 
> > I retract my comment about using a request, I got ahead of myself.
> > 
> > Don't update vCPUs, the whole point of adding the !kvm->created_vcpus
> > check was to avoid having to update vCPUs when the per-VM behavior
> > changed.
> > 
> > In other words, keep the restriction and drop the request.
> > 
> 
> I see. If we keep the restriction here and not updating vCPUs when
> kvm->created_vcpus is true, the per-VM and per-vCPU assumption would be
> different here? Not sure if I understand right:
> For per-VM, we assume the per-VM cap enabling is only before vcpus creation.
> For per-vCPU cap enabling, we are able to toggle the disabled exits runtime.

Yep.  The main reason being that there's no use case for changing per-VM settings
after vCPUs are created.  I.e. we could lift the restriction in the future if a
use case pops up, but until then, keep things simple.

> If I understand correctly, this also makes sense though.

Paging this all back in...

There are two (sane) options for defining KVM's ABI:

  1) KVM combines the per-VM and per-vCPU settings
  2) The per-vCPU settings override the per-VM settings

This series implements (2).

For (1), KVM would need to recheck the per-VM state during the per-vCPU update,
e.g. instead of simply modifying the per-vCPU flags, the vCPU-scoped handler
for KVM_CAP_X86_DISABLE_EXITS would need to merge the incoming settings with the
existing kvm->arch.xxx_in_guest flags.

I like (2) because it's simpler to implement and document (merging state is always
messy) and is more flexible.  E.g. with (1), the only way to have per-vCPU settings
is for userspace to NOT set the per-VM disables and then set disables on a per-vCPU
basis.  Whereas with (2), userspace can set (or not) the per-VM disables and then
override as needed.