Re: [PATCH v3 05/22] KVM: Add KVM_CAP_MEMORY_FAULT_INFO

Kautuk Consul <kconsul@xxxxxxxxxxxxxxxxxx> · Tue, 4 Jul 2023 15:40:45 +0530



On 2023-04-12 21:34:53, Anish Moorthy wrote:
> KVM_CAP_MEMORY_FAULT_INFO allows kvm_run to return useful information
> besides a return value of -1 and errno of EFAULT when a vCPU fails an
> access to guest memory.
> 
> Add documentation, updates to the KVM headers, and a helper function
> (kvm_populate_efault_info) for implementing the capability.
> 
> Besides simply filling the run struct, kvm_populate_efault_info takes
> two safety measures
> 
>   a. It tries to prevent concurrent fills on a single vCPU run struct
>      by checking that the run struct being modified corresponds to the
>      currently loaded vCPU.
>   b. It tries to avoid filling an already-populated run struct by
>      checking whether the exit reason has been modified since entry
>      into KVM_RUN.
> 
> Finally, mark KVM_CAP_MEMORY_FAULT_INFO as available on arm64 and x86,
> even though EFAULT annotation are currently totally absent. Picking a
> point to declare the implementation "done" is difficult because
> 
>   1. Annotations will be performed incrementally in subsequent commits
>      across both core and arch-specific KVM.
>   2. The initial series will very likely miss some cases which need
>      annotation. Although these omissions are to be fixed in the future,
>      userspace thus still needs to expect and be able to handle
>      unannotated EFAULTs.
> 
> Given these qualifications, just marking it available here seems the
> least arbitrary thing to do.
> 
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Anish Moorthy <amoorthy@xxxxxxxxxx>
> ---
>  Documentation/virt/kvm/api.rst | 35 +++++++++++++++++++++++++++
>  arch/arm64/kvm/arm.c           |  1 +
>  arch/x86/kvm/x86.c             |  1 +
>  include/linux/kvm_host.h       | 12 ++++++++++
>  include/uapi/linux/kvm.h       | 16 +++++++++++++
>  tools/include/uapi/linux/kvm.h | 11 +++++++++
>  virt/kvm/kvm_main.c            | 44 ++++++++++++++++++++++++++++++++++
>  7 files changed, 120 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 48fad65568227..f174f43c38d45 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6637,6 +6637,18 @@ array field represents return values. The userspace should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>  
> +::
> +
> +		/* KVM_EXIT_MEMORY_FAULT */
> +		struct {
> +			__u64 flags;
> +			__u64 gpa;
> +			__u64 len; /* in bytes */
> +		} memory_fault;
> +
> +Indicates a vCPU memory fault on the guest physical address range
> +[gpa, gpa + len). See KVM_CAP_MEMORY_FAULT_INFO for more details.
> +
>  ::
>  
>      /* KVM_EXIT_NOTIFY */
> @@ -7670,6 +7682,29 @@ This capability is aimed to mitigate the threat that malicious VMs can
>  cause CPU stuck (due to event windows don't open up) and make the CPU
>  unavailable to host or other VMs.
>  
> +7.34 KVM_CAP_MEMORY_FAULT_INFO
> +------------------------------
> +
> +:Architectures: x86, arm64
> +:Parameters: args[0] - KVM_MEMORY_FAULT_INFO_ENABLE|DISABLE to enable/disable
> +             the capability.
> +:Returns: 0 on success, or -EINVAL if unsupported or invalid args[0].
> +
> +When enabled, EFAULTs "returned" by KVM_RUN in response to failed vCPU guest
> +memory accesses may be annotated with additional information. When KVM_RUN
> +returns an error with errno=EFAULT, userspace may check the exit reason: if it
> +is KVM_EXIT_MEMORY_FAULT, userspace is then permitted to read the 'memory_fault'
> +member of the run struct.
> +
> +The 'gpa' and 'len' (in bytes) fields describe the range of guest
> +physical memory to which access failed, i.e. [gpa, gpa + len). 'flags' is
> +currently always zero.
> +
> +NOTE: The implementation of this capability is incomplete. Even with it enabled,
> +userspace may receive "bare" EFAULTs (i.e. exit reason !=
> +KVM_EXIT_MEMORY_FAULT) from KVM_RUN. These should be considered bugs and
> +reported to the maintainers.
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a43e1cb3b7e97..a932346b59f61 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_VCPU_ATTRIBUTES:
>  	case KVM_CAP_PTP_KVM:
>  	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> +	case KVM_CAP_MEMORY_FAULT_INFO:
>  		r = 1;
>  		break;
>  	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ca73eb066af81..0925678e741de 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4432,6 +4432,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_VAPIC:
>  	case KVM_CAP_ENABLE_CAP:
>  	case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
> +	case KVM_CAP_MEMORY_FAULT_INFO:
>  		r = 1;
>  		break;
>  	case KVM_CAP_EXIT_HYPERCALL:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 90edc16d37e59..776f9713f3921 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -805,6 +805,8 @@ struct kvm {
>  	struct notifier_block pm_notifier;
>  #endif
>  	char stats_id[KVM_STATS_NAME_SIZE];
> +
> +	bool fill_efault_info;
>  };
>  
>  #define kvm_err(fmt, ...) \
> @@ -2277,4 +2279,14 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
>  /* Max number of entries allowed for each kvm dirty ring */
>  #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
>  
> +/*
> + * Attempts to set the run struct's exit reason to KVM_EXIT_MEMORY_FAULT and
> + * populate the memory_fault field with the given information.
> + *
> + * Does nothing if KVM_CAP_MEMORY_FAULT_INFO is not enabled. WARNs and does
> + * nothing if the exit reason is not KVM_EXIT_UNKNOWN, or if 'vcpu' is not
> + * the current running vcpu.
> + */
> +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu,
> +					uint64_t gpa, uint64_t len);
>  #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4003a166328cc..bc73e8381a2bb 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -264,6 +264,7 @@ struct kvm_xen_exit {
>  #define KVM_EXIT_RISCV_SBI        35
>  #define KVM_EXIT_RISCV_CSR        36
>  #define KVM_EXIT_NOTIFY           37
> +#define KVM_EXIT_MEMORY_FAULT     38
>  
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -505,6 +506,16 @@ struct kvm_run {
>  #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
>  			__u32 flags;
>  		} notify;
> +		/* KVM_EXIT_MEMORY_FAULT */
> +		struct {
> +			/*
> +			 * Indicates a memory fault on the guest physical address range
> +			 * [gpa, gpa + len). flags is always zero for now.
> +			 */
> +			__u64 flags;
> +			__u64 gpa;
> +			__u64 len; /* in bytes */
> +		} memory_fault;
>  		/* Fix the size of the union. */
>  		char padding[256];
>  	};
> @@ -1184,6 +1195,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
>  #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
>  #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
> +#define KVM_CAP_MEMORY_FAULT_INFO 227
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -2237,4 +2249,8 @@ struct kvm_s390_zpci_op {
>  /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
>  #define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
>  
> +/* flags for KVM_CAP_MEMORY_FAULT_INFO */
> +#define KVM_MEMORY_FAULT_INFO_DISABLE  0
> +#define KVM_MEMORY_FAULT_INFO_ENABLE   1
> +
>  #endif /* __LINUX_KVM_H */
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index 4003a166328cc..5c57796364d65 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -264,6 +264,7 @@ struct kvm_xen_exit {
>  #define KVM_EXIT_RISCV_SBI        35
>  #define KVM_EXIT_RISCV_CSR        36
>  #define KVM_EXIT_NOTIFY           37
> +#define KVM_EXIT_MEMORY_FAULT     38
>  
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -505,6 +506,16 @@ struct kvm_run {
>  #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
>  			__u32 flags;
>  		} notify;
> +		/* KVM_EXIT_MEMORY_FAULT */
> +		struct {
> +			/*
> +			 * Indicates a memory fault on the guest physical address range
> +			 * [gpa, gpa + len). flags is always zero for now.
> +			 */
> +			__u64 flags;
> +			__u64 gpa;
> +			__u64 len; /* in bytes */
> +		} memory_fault;
>  		/* Fix the size of the union. */
>  		char padding[256];
>  	};
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index cf7d3de6f3689..f3effc93cbef3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
>  	spin_lock_init(&kvm->mn_invalidate_lock);
>  	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
>  	xa_init(&kvm->vcpu_array);
> +	kvm->fill_efault_info = false;
>  
>  	INIT_LIST_HEAD(&kvm->gpc_list);
>  	spin_lock_init(&kvm->gpc_lock);
> @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  			put_pid(oldpid);
>  		}
>  		r = kvm_arch_vcpu_ioctl_run(vcpu);
> +		WARN_ON_ONCE(r == -EFAULT &&
> +					 vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT);
>  		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
>  		break;
>  	}
> @@ -4672,6 +4675,15 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  
>  		return r;
>  	}
> +	case KVM_CAP_MEMORY_FAULT_INFO: {
> +		if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap)
> +			|| (cap->args[0] != KVM_MEMORY_FAULT_INFO_ENABLE
> +				&& cap->args[0] != KVM_MEMORY_FAULT_INFO_DISABLE)) {
> +			return -EINVAL;
> +		}
> +		kvm->fill_efault_info = cap->args[0] == KVM_MEMORY_FAULT_INFO_ENABLE;
> +		return 0;
> +	}
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}
> @@ -6173,3 +6185,35 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
>  
>  	return init_context.err;
>  }
> +
> +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu,
> +					uint64_t gpa, uint64_t len)
> +{
> +	if (!vcpu->kvm->fill_efault_info)
> +		return;
> +
> +	preempt_disable();
> +	/*
> +	 * Ensure the this vCPU isn't modifying another vCPU's run struct, which
> +	 * would open the door for races between concurrent calls to this
> +	 * function.
> +	 */
> +	if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu)))
> +		goto out;
Why use WARN_ON_ONCE when there is a clear possiblity of preemption
kicking in (with the possibility of vcpu_load/vcpu_put being called
in the new task) before preempt_disable() is called in this function ?
I think you should use WARN_ON_ONCE only where there is some impossible
situation happening, not when there is a possibility of that
situation happening as per the kernel code. I think that this WARN_ON_ONCE
could make sense if kvm_populate_efault_info() is called from atomic context,
but not when you are disabling preemption from this function itself.
Basically I don't think there is any way we can guarantee that
preemption DOESN'T kick in before the preempt_disable() such that
this warning is actually something that deserves to have a kernel
WARN_ON_ONCE() warning.
Can we get rid of this WARN_ON_ONCE and straightaway jump to the
out label if "(vcpu != __this_cpu_read(kvm_running_vcpu))" is true, or
please do correct me if I am wrong about something ?
> +	/*
> +	 * Try not to overwrite an already-populated run struct.
> +	 * This isn't a perfect solution, as there's no guarantee that the exit
> +	 * reason is set before the run struct is populated, but it should prevent
> +	 * at least some bugs.
> +	 */
> +	else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN))
> +		goto out;
> +
> +	vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
> +	vcpu->run->memory_fault.gpa = gpa;
> +	vcpu->run->memory_fault.len = len;
> +	vcpu->run->memory_fault.flags = 0;
> +
> +out:
> +	preempt_enable();
> +}
> -- 
> 2.40.0.577.gac1e443424-goog
>