From: Mihai Donțu <mdontu@xxxxxxxxxxxxxxx> Besides the pointer to the new structure, the patch adds to the kvm structure a reference counter (the new object will be used by the thread receiving introspection commands/events) and a completion variable (to signal that the VM can be hooked by the introspection tool). Signed-off-by: Mihai Donțu <mdontu@xxxxxxxxxxxxxxx> Co-developed-by: Mircea Cîrjaliu <mcirjaliu@xxxxxxxxxxxxxxx> Signed-off-by: Mircea Cîrjaliu <mcirjaliu@xxxxxxxxxxxxxxx> Signed-off-by: Adalbert Lazăr <alazar@xxxxxxxxxxxxxxx> --- Documentation/virtual/kvm/kvmi.rst | 75 ++++++++++++++++++++++++++++++ arch/x86/kvm/Kconfig | 7 +++ arch/x86/kvm/Makefile | 1 + include/linux/kvm_host.h | 4 ++ include/linux/kvmi.h | 23 +++++++++ include/uapi/linux/kvmi.h | 68 +++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 10 +++- virt/kvm/kvmi.c | 64 +++++++++++++++++++++++++ virt/kvm/kvmi_int.h | 12 +++++ 9 files changed, 263 insertions(+), 1 deletion(-) create mode 100644 Documentation/virtual/kvm/kvmi.rst create mode 100644 include/linux/kvmi.h create mode 100644 include/uapi/linux/kvmi.h create mode 100644 virt/kvm/kvmi.c create mode 100644 virt/kvm/kvmi_int.h diff --git a/Documentation/virtual/kvm/kvmi.rst b/Documentation/virtual/kvm/kvmi.rst new file mode 100644 index 000000000000..d54caf8d974f --- /dev/null +++ b/Documentation/virtual/kvm/kvmi.rst @@ -0,0 +1,75 @@ +========================================================= +KVMI - The kernel virtual machine introspection subsystem +========================================================= + +The KVM introspection subsystem provides a facility for applications running +on the host or in a separate VM, to control the execution of other VM-s +(pause, resume, shutdown), query the state of the vCPUs (GPRs, MSRs etc.), +alter the page access bits in the shadow page tables (only for the hardware +backed ones, eg. Intel's EPT) and receive notifications when events of +interest have taken place (shadow page table level faults, key MSR writes, +hypercalls etc.). Some notifications can be responded to with an action +(like preventing an MSR from being written), others are mere informative +(like breakpoint events which can be used for execution tracing). +With few exceptions, all events are optional. An application using this +subsystem will explicitly register for them. + +The use case that gave way for the creation of this subsystem is to monitor +the guest OS and as such the ABI/API is highly influenced by how the guest +software (kernel, applications) sees the world. For example, some events +provide information specific for the host CPU architecture +(eg. MSR_IA32_SYSENTER_EIP) merely because its leveraged by guest software +to implement a critical feature (fast system calls). + +At the moment, the target audience for KVMI are security software authors +that wish to perform forensics on newly discovered threats (exploits) or +to implement another layer of security like preventing a large set of +kernel rootkits simply by "locking" the kernel image in the shadow page +tables (ie. enforce .text r-x, .rodata rw- etc.). It's the latter case that +made KVMI a separate subsystem, even though many of these features are +available in the device manager (eg. QEMU). The ability to build a security +application that does not interfere (in terms of performance) with the +guest software asks for a specialized interface that is designed for minimum +overhead. + +API/ABI +======= + +This chapter describes the VMI interface used to monitor and control local +guests from a user application. + +Overview +-------- + +The interface is socket based, one connection for every VM. One end is in the +host kernel while the other is held by the user application (introspection +tool). + +The initial connection is established by an application running on the host +(eg. QEMU) that connects to the introspection tool and after a handshake the +socket is passed to the host kernel making all further communication take +place between it and the introspection tool. The initiating party (QEMU) can +close its end so that any potential exploits cannot take a hold of it. + +The socket protocol allows for commands and events to be multiplexed over +the same connection. As such, it is possible for the introspection tool to +receive an event while waiting for the result of a command. Also, it can +send a command while the host kernel is waiting for a reply to an event. + +The kernel side of the socket communication is blocking and will wait for +an answer from its peer indefinitely or until the guest is powered off +(killed), restarted or the peer goes away, at which point it will wake +up and properly cleanup as if the introspection subsystem has never been +used on that guest. Obviously, whether the guest can really continue +normal execution depends on whether the introspection tool has made any +modifications that require an active KVMI channel. + +Memory access safety +-------------------- + +The KVMI API gives access to the entire guest physical address space but +provides no information on which parts of it are system RAM and which are +device-specific memory (DMA, emulated MMIO, reserved by a passthrough +device etc.). It is up to the user to determine, using the guest operating +system data structures, the areas that are safe to access (code, stack, heap +etc.). diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 72fa955f4a15..f70a6a1b6814 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -96,6 +96,13 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. +config KVM_INTROSPECTION + bool "VM Introspection" + depends on KVM && (KVM_INTEL || KVM_AMD) + help + This option enables functions to control the execution of VM-s, query + the state of the vCPU-s (GPR-s, MSR-s etc.). + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source "drivers/vhost/Kconfig" diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 31ecf7a76d5a..312597bd47c7 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -7,6 +7,7 @@ KVM := ../../../virt/kvm kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o +kvm-$(CONFIG_KVM_INTROSPECTION) += $(KVM)/kvmi.o kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c38cc5eb7e73..582b0187f5a4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -455,6 +455,10 @@ struct kvm { struct srcu_struct srcu; struct srcu_struct irq_srcu; pid_t userspace_pid; + + struct completion kvmi_completed; + refcount_t kvmi_ref; + void *kvmi; }; #define kvm_err(fmt, ...) \ diff --git a/include/linux/kvmi.h b/include/linux/kvmi.h new file mode 100644 index 000000000000..e36de3f9f3de --- /dev/null +++ b/include/linux/kvmi.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __KVMI_H__ +#define __KVMI_H__ + +#define kvmi_is_present() IS_ENABLED(CONFIG_KVM_INTROSPECTION) + +#ifdef CONFIG_KVM_INTROSPECTION + +int kvmi_init(void); +void kvmi_uninit(void); +void kvmi_create_vm(struct kvm *kvm); +void kvmi_destroy_vm(struct kvm *kvm); + +#else + +static inline int kvmi_init(void) { return 0; } +static inline void kvmi_uninit(void) { } +static inline void kvmi_create_vm(struct kvm *kvm) { } +static inline void kvmi_destroy_vm(struct kvm *kvm) { } + +#endif /* CONFIG_KVM_INTROSPECTION */ + +#endif diff --git a/include/uapi/linux/kvmi.h b/include/uapi/linux/kvmi.h new file mode 100644 index 000000000000..dbf63ad0862f --- /dev/null +++ b/include/uapi/linux/kvmi.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI__LINUX_KVMI_H +#define _UAPI__LINUX_KVMI_H + +/* + * KVMI structures and definitions + */ + +#include <linux/kernel.h> +#include <linux/types.h> + +#define KVMI_VERSION 0x00000001 + +enum { + KVMI_EVENT_REPLY = 0, + KVMI_EVENT = 1, + + KVMI_FIRST_COMMAND = 2, + + KVMI_GET_VERSION = 2, + KVMI_CHECK_COMMAND = 3, + KVMI_CHECK_EVENT = 4, + KVMI_GET_GUEST_INFO = 5, + KVMI_GET_VCPU_INFO = 6, + KVMI_PAUSE_VCPU = 7, + KVMI_CONTROL_VM_EVENTS = 8, + KVMI_CONTROL_EVENTS = 9, + KVMI_CONTROL_CR = 10, + KVMI_CONTROL_MSR = 11, + KVMI_CONTROL_VE = 12, + KVMI_GET_REGISTERS = 13, + KVMI_SET_REGISTERS = 14, + KVMI_GET_CPUID = 15, + KVMI_GET_XSAVE = 16, + KVMI_READ_PHYSICAL = 17, + KVMI_WRITE_PHYSICAL = 18, + KVMI_INJECT_EXCEPTION = 19, + KVMI_GET_PAGE_ACCESS = 20, + KVMI_SET_PAGE_ACCESS = 21, + KVMI_GET_MAP_TOKEN = 22, + KVMI_GET_MTRR_TYPE = 23, + KVMI_CONTROL_SPP = 24, + KVMI_GET_PAGE_WRITE_BITMAP = 25, + KVMI_SET_PAGE_WRITE_BITMAP = 26, + KVMI_CONTROL_CMD_RESPONSE = 27, + + KVMI_NEXT_AVAILABLE_COMMAND, + +}; + +enum { + KVMI_EVENT_UNHOOK = 0, + KVMI_EVENT_CR = 1, + KVMI_EVENT_MSR = 2, + KVMI_EVENT_XSETBV = 3, + KVMI_EVENT_BREAKPOINT = 4, + KVMI_EVENT_HYPERCALL = 5, + KVMI_EVENT_PF = 6, + KVMI_EVENT_TRAP = 7, + KVMI_EVENT_DESCRIPTOR = 8, + KVMI_EVENT_CREATE_VCPU = 9, + KVMI_EVENT_PAUSE_VCPU = 10, + KVMI_EVENT_SINGLESTEP = 11, + + KVMI_NUM_EVENTS +}; + +#endif /* _UAPI__LINUX_KVMI_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 585845203db8..90e432d225ab 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -51,6 +51,7 @@ #include <linux/slab.h> #include <linux/sort.h> #include <linux/bsearch.h> +#include <linux/kvmi.h> #include <asm/processor.h> #include <asm/io.h> @@ -680,6 +681,8 @@ static struct kvm *kvm_create_vm(unsigned long type) if (r) goto out_err; + kvmi_create_vm(kvm); + spin_lock(&kvm_lock); list_add(&kvm->vm_list, &vm_list); spin_unlock(&kvm_lock); @@ -725,6 +728,7 @@ static void kvm_destroy_vm(struct kvm *kvm) int i; struct mm_struct *mm = kvm->mm; + kvmi_destroy_vm(kvm); kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm); kvm_destroy_vm_debugfs(kvm); kvm_arch_sync_events(kvm); @@ -1556,7 +1560,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier. - */ + */ kvm_get_pfn(pfn); *p_pfn = pfn; @@ -4204,6 +4208,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, r = kvm_vfio_ops_init(); WARN_ON(r); + r = kvmi_init(); + WARN_ON(r); + return 0; out_unreg: @@ -4229,6 +4236,7 @@ EXPORT_SYMBOL_GPL(kvm_init); void kvm_exit(void) { + kvmi_uninit(); debugfs_remove_recursive(kvm_debugfs_dir); misc_deregister(&kvm_dev); kmem_cache_destroy(kvm_vcpu_cache); diff --git a/virt/kvm/kvmi.c b/virt/kvm/kvmi.c new file mode 100644 index 000000000000..20638743bd03 --- /dev/null +++ b/virt/kvm/kvmi.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM introspection + * + * Copyright (C) 2017-2019 Bitdefender S.R.L. + * + */ +#include <uapi/linux/kvmi.h> +#include "kvmi_int.h" + +int kvmi_init(void) +{ + return 0; +} + +void kvmi_uninit(void) +{ +} + +struct kvmi * __must_check kvmi_get(struct kvm *kvm) +{ + if (refcount_inc_not_zero(&kvm->kvmi_ref)) + return kvm->kvmi; + + return NULL; +} + +static void kvmi_destroy(struct kvm *kvm) +{ +} + +static void kvmi_release(struct kvm *kvm) +{ + kvmi_destroy(kvm); + + complete(&kvm->kvmi_completed); +} + +/* This function may be called from atomic context and must not sleep */ +void kvmi_put(struct kvm *kvm) +{ + if (refcount_dec_and_test(&kvm->kvmi_ref)) + kvmi_release(kvm); +} + +void kvmi_create_vm(struct kvm *kvm) +{ + init_completion(&kvm->kvmi_completed); + complete(&kvm->kvmi_completed); +} + +void kvmi_destroy_vm(struct kvm *kvm) +{ + struct kvmi *ikvm; + + ikvm = kvmi_get(kvm); + if (!ikvm) + return; + + kvmi_put(kvm); + + /* wait for introspection resources to be released */ + wait_for_completion_killable(&kvm->kvmi_completed); +} diff --git a/virt/kvm/kvmi_int.h b/virt/kvm/kvmi_int.h new file mode 100644 index 000000000000..ac23ad6fc4df --- /dev/null +++ b/virt/kvm/kvmi_int.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __KVMI_INT_H__ +#define __KVMI_INT_H__ + +#include <linux/kvm_host.h> + +#define IKVM(kvm) ((struct kvmi *)((kvm)->kvmi)) + +struct kvmi { +}; + +#endif