Re: [PATCH v4 2/2] KVM: x86: KVM_CAP_SYNC_REGS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wanpeng, thanks for noticing that! I was able to replicate the problem
and identified the cause.

Radim, in the merged kvm/queue version of arch/x86/kvm/x86.c,
vcpu_put() was mistakenly deleted from
kvm_arch_vcpu_ioctl_set_mpstate() and inserted into __set_sregs().

So, kvm_arch_vcpu_ioctl_set_sregs() executes two vcpu_put() calls and
kvm_arch_vcpu_ioctl_set_mpstate() executes none. #GP soon follows! ;-)

I can send a patch but since the issue on kvm/queue, I'm sure what the
protocol is. Anyhow, here's a diff of the corrected code.
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3354dea4d03a..786cd0026ca7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7678,6 +7678,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,

  ret = 0;
 out:
+ vcpu_put(vcpu);
  return ret;
 }

@@ -7811,7 +7812,6 @@ static int __set_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)

  ret = 0;
 out:
- vcpu_put(vcpu);
  return ret;
 }

-- 

On Mon, Feb 5, 2018 at 4:46 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
> 2018-02-06 8:32 GMT+08:00 Ken Hofsass <hofsass@xxxxxxxxxx>:
>> Hi Wanpeng, that's disturbing! Could you give me more context? Did
>> this occur while running part of the KVM unit tests? Does this happen
>> every time? Do you know if this is this the first invocation of
>> kvm_arch_vcpu_ioctl_set_sregs? When I ran against kvm-unit-tests
>> w/qemu, I didn't hit this. Nor when I ran our other internal tests.
>
> I just run a kvm guest (ubuntu 16.04 4.15-rc3) against kvm/queue w/
> CONFIG_PREEMPT enabled.
>
> Regards,
> Wanpeng Li
>
>> thanks,
>> Ken Hofsass
>>
>>
>> On Sun, Feb 4, 2018 at 5:29 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>>> 2018-02-01 8:03 GMT+08:00 Ken Hofsass <hofsass@xxxxxxxxxx>:
>>>> This commit implements an enhanced x86 version of S390
>>>> KVM_CAP_SYNC_REGS functionality. KVM_CAP_SYNC_REGS "allow[s]
>>>> userspace to access certain guest registers without having
>>>> to call SET/GET_*REGS”. This reduces ioctl overhead which
>>>> is particularly important when userspace is making synchronous
>>>> guest state modifications (e.g. when emulating and/or intercepting
>>>> instructions).
>>>>
>>>> Originally implemented upstream for the S390, the x86 differences
>>>> follow:
>>>> - userspace can select the register sets to be synchronized with kvm_run
>>>> using bit-flags in the kvm_valid_registers and kvm_dirty_registers
>>>> fields.
>>>> - vcpu_events is available in addition to the regs and sregs register
>>>> sets.
>>>>
>>>> Signed-off-by: Ken Hofsass <hofsass@xxxxxxxxxx>
>>>
>>> This commit in kvm/queue results in:
>>>
>>>  general protection fault: 0000 [#1] PREEMPT SMP PTI
>>>  CPU: 7 PID: 2348 Comm: qemu-system-x86 Not tainted 4.15.0+ #4
>>>  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS
>>> FBKTC1AUS 02/16/2016
>>>  RIP: 0010:preempt_notifier_unregister+0x13/0x40
>>>  RSP: 0018:ffffbe6541c07da8 EFLAGS: 00010286
>>>  RAX: dead000000000100 RBX: ffff99ed46ca8000 RCX: 0000000037271e7b
>>>  RDX: dead000000000200 RSI: 0000000000000000 RDI: ffff99ed46ca8008
>>>  RBP: ffffbe6541c07da8 R08: 00000000fe0913b4 R09: 442c1f7a00000000
>>>  R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
>>>  R13: ffff99ed46ca8000 R14: ffff99ed46ca8058 R15: ffff99ed47635400
>>>  FS:  00007f3c2b474700(0000) GS:ffff99ed8e200000(0000) knlGS:0000000000000000
>>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>  CR2: 00007f3c1c001000 CR3: 00000003c73f4006 CR4: 00000000001626e0
>>>  Call Trace:
>>>   vcpu_put+0x28/0x50 [kvm]
>>>   kvm_arch_vcpu_ioctl_set_sregs+0x2d/0x40 [kvm]
>>>   kvm_vcpu_ioctl+0x54a/0x720 [kvm]
>>>   ? __fget+0xfc/0x210
>>>   ? __fget+0xfc/0x210
>>>   do_vfs_ioctl+0xa4/0x6a0
>>>   ? __fget+0x11d/0x210
>>>   SyS_ioctl+0x79/0x90
>>>   entry_SYSCALL_64_fastpath+0x25/0x9c
>>>  RIP: 0033:0x7f3c36fd0f47
>>>  RSP: 002b:00007f3c2b473668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>>>  RAX: ffffffffffffffda RBX: 0000555a11f78c10 RCX: 00007f3c36fd0f47
>>>  RDX: 00007f3c2b473710 RSI: 000000004138ae84 RDI: 000000000000000f
>>>  RBP: 0000000000000003 R08: 0000555a107825b0 R09: 00007f3c1c001000
>>>  R10: 00007f3c1c003010 R11: 0000000000000246 R12: 00007f3c2b473870
>>>  R13: 00007f3c1c001000 R14: 00007f3c2b4749c0 R15: 0000555a11f78c10
>>>  RIP: preempt_notifier_unregister+0x13/0x40 RSP: ffffbe6541c07da8
>>>  ---[ end trace 318a26af2b7ef2bc ]---
>>>
>>> Regards,
>>> Wanpeng Li
>>>
>>>> ---
>>>>  Documentation/virtual/kvm/api.txt |  40 ++++++++++++++
>>>>  arch/x86/include/uapi/asm/kvm.h   |  19 ++++++-
>>>>  arch/x86/kvm/x86.c                | 108 +++++++++++++++++++++++++++++++++-----
>>>>  3 files changed, 152 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index de55fe173afe..97bc2d0e69aa 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -4014,6 +4014,46 @@ Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
>>>>  accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
>>>>  the guest.
>>>>
>>>> +6.74 KVM_CAP_SYNC_REGS
>>>> +Architectures: s390, x86
>>>> +Target: s390: always enabled, x86: vcpu
>>>> +Parameters: none
>>>> +Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
>>>> +sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).
>>>> +
>>>> +As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
>>>> +KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
>>>> +without having to call SET/GET_*REGS". This reduces overhead by eliminating
>>>> +repeated ioctl calls for setting and/or getting register values. This is
>>>> +particularly important when userspace is making synchronous guest state
>>>> +modifications, e.g. when emulating and/or intercepting instructions in
>>>> +userspace.
>>>> +
>>>> +For s390 specifics, please refer to the source code.
>>>> +
>>>> +For x86:
>>>> +- the register sets to be copied out to kvm_run are selectable
>>>> +  by userspace (rather that all sets being copied out for every exit).
>>>> +- vcpu_events are available in addition to regs and sregs.
>>>> +
>>>> +For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
>>>> +function as an input bit-array field set by userspace to indicate the
>>>> +specific register sets to be copied out on the next exit.
>>>> +
>>>> +To indicate when userspace has modified values that should be copied into
>>>> +the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
>>>> +This is done using the same bitflags as for the 'kvm_valid_regs' field.
>>>> +If the dirty bit is not set, then the register set values will not be copied
>>>> +into the vCPU even if they've been modified.
>>>> +
>>>> +Unused bitfields in the bitarrays must be set to zero.
>>>> +
>>>> +struct kvm_sync_regs {
>>>> +        struct kvm_regs regs;
>>>> +        struct kvm_sregs sregs;
>>>> +        struct kvm_vcpu_events events;
>>>> +};
>>>> +
>>>>  7. Capabilities that can be enabled on VMs
>>>>  ------------------------------------------
>>>>
>>>> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
>>>> index f3a960488eae..c535c2fdea13 100644
>>>> --- a/arch/x86/include/uapi/asm/kvm.h
>>>> +++ b/arch/x86/include/uapi/asm/kvm.h
>>>> @@ -354,8 +354,25 @@ struct kvm_xcrs {
>>>>         __u64 padding[16];
>>>>  };
>>>>
>>>> -/* definition of registers in kvm_run */
>>>> +#define KVM_SYNC_X86_REGS      (1UL << 0)
>>>> +#define KVM_SYNC_X86_SREGS     (1UL << 1)
>>>> +#define KVM_SYNC_X86_EVENTS    (1UL << 2)
>>>> +
>>>> +#define KVM_SYNC_X86_VALID_FIELDS \
>>>> +       (KVM_SYNC_X86_REGS| \
>>>> +        KVM_SYNC_X86_SREGS| \
>>>> +        KVM_SYNC_X86_EVENTS)
>>>> +
>>>> +/* kvm_sync_regs struct included by kvm_run struct */
>>>>  struct kvm_sync_regs {
>>>> +       /* Members of this structure are potentially malicious.
>>>> +        * Care must be taken by code reading, esp. interpreting,
>>>> +        * data fields from them inside KVM to prevent TOCTOU and
>>>> +        * double-fetch types of vulnerabilities.
>>>> +        */
>>>> +       struct kvm_regs regs;
>>>> +       struct kvm_sregs sregs;
>>>> +       struct kvm_vcpu_events events;
>>>>  };
>>>>
>>>>  #define KVM_X86_QUIRK_LINT0_REENABLED  (1 << 0)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 0204b2b8a293..ffd2c4e5d245 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -100,6 +100,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu);
>>>>  static void process_nmi(struct kvm_vcpu *vcpu);
>>>>  static void enter_smm(struct kvm_vcpu *vcpu);
>>>>  static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>>>> +static void store_regs(struct kvm_vcpu *vcpu);
>>>> +static int sync_regs(struct kvm_vcpu *vcpu);
>>>> +static int check_valid_regs(struct kvm_vcpu *vcpu);
>>>>
>>>>  struct kvm_x86_ops *kvm_x86_ops __read_mostly;
>>>>  EXPORT_SYMBOL_GPL(kvm_x86_ops);
>>>> @@ -2743,6 +2746,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>         case KVM_CAP_IMMEDIATE_EXIT:
>>>>                 r = 1;
>>>>                 break;
>>>> +       case KVM_CAP_SYNC_REGS:
>>>> +               r = KVM_SYNC_X86_VALID_FIELDS;
>>>> +               break;
>>>>         case KVM_CAP_ADJUST_CLOCK:
>>>>                 r = KVM_CLOCK_TSC_STABLE;
>>>>                 break;
>>>> @@ -7325,6 +7331,12 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
>>>>         return 0;
>>>>  }
>>>>
>>>> +static int check_valid_regs(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS)
>>>> +               return -EINVAL;
>>>> +       return 0;
>>>> +}
>>>>
>>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>>>  {
>>>> @@ -7351,6 +7363,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>>>                 goto out;
>>>>         }
>>>>
>>>> +       if (vcpu->run->kvm_valid_regs) {
>>>> +               r = check_valid_regs(vcpu);
>>>> +               if (r != 0)
>>>> +                       goto out;
>>>> +       }
>>>> +       if (vcpu->run->kvm_dirty_regs) {
>>>> +               r = sync_regs(vcpu);
>>>> +               if (r != 0)
>>>> +                       goto out;
>>>> +       }
>>>> +
>>>>         /* re-sync apic's tpr */
>>>>         if (!lapic_in_kernel(vcpu)) {
>>>>                 if (kvm_set_cr8(vcpu, kvm_run->cr8) != 0) {
>>>> @@ -7375,6 +7398,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>>>
>>>>  out:
>>>>         kvm_put_guest_fpu(vcpu);
>>>> +       if (vcpu->run->kvm_valid_regs)
>>>> +               store_regs(vcpu);
>>>>         post_kvm_run_save(vcpu);
>>>>         kvm_sigset_deactivate(vcpu);
>>>>
>>>> @@ -7382,10 +7407,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>>>         return r;
>>>>  }
>>>>
>>>> -int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>> +static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>>  {
>>>> -       vcpu_load(vcpu);
>>>> -
>>>>         if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
>>>>                 /*
>>>>                  * We are here if userspace calls get_regs() in the middle of
>>>> @@ -7418,15 +7441,18 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>>
>>>>         regs->rip = kvm_rip_read(vcpu);
>>>>         regs->rflags = kvm_get_rflags(vcpu);
>>>> +}
>>>>
>>>> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>> +{
>>>> +       vcpu_load(vcpu);
>>>> +       __get_regs(vcpu, regs);
>>>>         vcpu_put(vcpu);
>>>>         return 0;
>>>>  }
>>>>
>>>> -int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>> +static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>>  {
>>>> -       vcpu_load(vcpu);
>>>> -
>>>>         vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
>>>>         vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
>>>>
>>>> @@ -7455,7 +7481,12 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>>         vcpu->arch.exception.pending = false;
>>>>
>>>>         kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>> +}
>>>>
>>>> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>>>> +{
>>>> +       vcpu_load(vcpu);
>>>> +       __set_regs(vcpu, regs);
>>>>         vcpu_put(vcpu);
>>>>         return 0;
>>>>  }
>>>> @@ -7470,13 +7501,10 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(kvm_get_cs_db_l_bits);
>>>>
>>>> -int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>>>> -                                 struct kvm_sregs *sregs)
>>>> +static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>>>>  {
>>>>         struct desc_ptr dt;
>>>>
>>>> -       vcpu_load(vcpu);
>>>> -
>>>>         kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
>>>>         kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
>>>>         kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
>>>> @@ -7507,7 +7535,13 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>>>>         if (vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft)
>>>>                 set_bit(vcpu->arch.interrupt.nr,
>>>>                         (unsigned long *)sregs->interrupt_bitmap);
>>>> +}
>>>>
>>>> +int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>>>> +                                 struct kvm_sregs *sregs)
>>>> +{
>>>> +       vcpu_load(vcpu);
>>>> +       __get_sregs(vcpu, sregs);
>>>>         vcpu_put(vcpu);
>>>>         return 0;
>>>>  }
>>>> @@ -7579,8 +7613,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(kvm_task_switch);
>>>>
>>>> -int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>>> -                                 struct kvm_sregs *sregs)
>>>> +static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>>>>  {
>>>>         struct msr_data apic_base_msr;
>>>>         int mmu_reset_needed = 0;
>>>> @@ -7588,8 +7621,6 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>>>         struct desc_ptr dt;
>>>>         int ret = -EINVAL;
>>>>
>>>> -       vcpu_load(vcpu);
>>>> -
>>>>         if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
>>>>                         (sregs->cr4 & X86_CR4_OSXSAVE))
>>>>                 goto out;
>>>> @@ -7665,6 +7696,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>>>
>>>>         ret = 0;
>>>>  out:
>>>> +       return ret;
>>>> +}
>>>> +
>>>> +int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>>> +                                 struct kvm_sregs *sregs)
>>>> +{
>>>> +       int ret;
>>>> +
>>>> +       vcpu_load(vcpu);
>>>> +       ret = __set_sregs(vcpu, sregs);
>>>>         vcpu_put(vcpu);
>>>>         return ret;
>>>>  }
>>>> @@ -7791,6 +7832,45 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
>>>>         return 0;
>>>>  }
>>>>
>>>> +static void store_regs(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
>>>> +
>>>> +       if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
>>>> +               __get_regs(vcpu, &vcpu->run->s.regs.regs);
>>>> +
>>>> +       if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
>>>> +               __get_sregs(vcpu, &vcpu->run->s.regs.sregs);
>>>> +
>>>> +       if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_EVENTS)
>>>> +               kvm_vcpu_ioctl_x86_get_vcpu_events(
>>>> +                               vcpu, &vcpu->run->s.regs.events);
>>>> +}
>>>> +
>>>> +static int sync_regs(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       if (vcpu->run->kvm_dirty_regs & ~KVM_SYNC_X86_VALID_FIELDS)
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
>>>> +               __set_regs(vcpu, &vcpu->run->s.regs.regs);
>>>> +               vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
>>>> +       }
>>>> +       if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
>>>> +               if (__set_sregs(vcpu, &vcpu->run->s.regs.sregs))
>>>> +                       return -EINVAL;
>>>> +               vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
>>>> +       }
>>>> +       if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
>>>> +               if (kvm_vcpu_ioctl_x86_set_vcpu_events(
>>>> +                               vcpu, &vcpu->run->s.regs.events))
>>>> +                       return -EINVAL;
>>>> +               vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_EVENTS;
>>>> +       }
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>  static void fx_init(struct kvm_vcpu *vcpu)
>>>>  {
>>>>         fpstate_init(&vcpu->arch.guest_fpu.state);
>>>> --
>>>> 2.16.0.rc1.238.g530d649a79-goog
>>>>




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux