Hi Oliver,
On 3/24/22 1:11 AM, Oliver Upton wrote:
More comments, didn't see exactly how all of these structures are
getting used.
Ok, thanks for your review and comments.
On Tue, Mar 22, 2022 at 04:06:50PM +0800, Gavin Shan wrote:
[...]
diff --git a/arch/arm64/include/uapi/asm/kvm_sdei_state.h b/arch/arm64/include/uapi/asm/kvm_sdei_state.h
new file mode 100644
index 000000000000..b14844230117
--- /dev/null
+++ b/arch/arm64/include/uapi/asm/kvm_sdei_state.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Definitions of various KVM SDEI event states.
+ *
+ * Copyright (C) 2022 Red Hat, Inc.
+ *
+ * Author(s): Gavin Shan <gshan@xxxxxxxxxx>
+ */
+
+#ifndef _UAPI__ASM_KVM_SDEI_STATE_H
+#define _UAPI__ASM_KVM_SDEI_STATE_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+/*
+ * The software signaled event is the default one, which is
+ * defined in v1.1 specification.
+ */
+#define KVM_SDEI_INVALID_EVENT 0xFFFFFFFF
Isn't the constraint that bit 31 must be zero? (DEN 0054C 4.4 "Event
number allocation")
Yes, bit 31 of the event number should be zero. So this is invalid
event number, used by struct kvm_sdei_vcpu_state::critical_num
and normal_num to indicate if there is event being handled on the
corresponding vcpu. When those fields are set to KVM_SDEI_INVALID_EVENT,
no event is being handled on the vcpu.
+#define KVM_SDEI_DEFAULT_EVENT 0
+
+#define KVM_SDEI_MAX_VCPUS 512 /* Aligned to 64 */
+#define KVM_SDEI_MAX_EVENTS 128
I would *strongly* recommend against having these limits. I find the
vCPU limit especially concerning, because we're making KVM_MAX_VCPUS
ABI, which it definitely is not. Anything that deals with a vCPU should
be accessed through a vCPU FD (and thus agnostic to the maximum number
of vCPUs) to avoid such a complication.
For KVM_SDEI_DEFAULT_EVENT, which corresponds to the software signaled
event. As you suggested on PATCH[15/22], we can't assume its usage.
I will define it with SDEI_SW_SIGNALED_EVENT in uapi/linux/arm_sdei.h
For KVM_SDEI_MAX_EVENTS, it will be moved from this header file to
kvm_sdei.h after static arrays to hold the data structures or their
pointers are used, as you suggested early for this patch (PATCH[02/22]).
There are two types of (SDEI) events: shared and private. For the private
event, it can be registered independently from the vcpus. It also means
the address and argument for the entry points, corresponding to @ep_address
and @ep_arg in struct kvm_sdei_registered_event, can be different on
the individual vcpus. However, all the registered/enabled states and
the entry point address and argument are same on all vcpus for the shared
event. KVM_SDEI_MAX_VCPUS was introduced to use same data structure to
represent both shared and private event.
If the data belongs to particular vcpu should be accessed through the
vcpu fd, then we need to split or reorganize the data struct as below.
/*
* The events are exposed through ioctl interface or similar
* mechanism (synthetic system registers?) before they can be
* registered. struct kvm_sdei_exposed_event instance is reserved
* from the kvm's static array on receiving the ioctl command
* from VMM.
*/
struct kvm_sdei_exposed_event {
__u32 num;
__u8 type;
__u8 signaled;
__u8 priority;
__u8 padding;
};
/*
* The struct kvm_sdei_registered_event instance is allocated or
* reserved from the static array. For the shared event, the instance
* is linked to kvm, but it will be allocated or reserved from vcpu's
* static array and linked to the vcpu if it's a private event.
*
* The instance is only allocated and reserved upon SDEI_EVENT_REGISTER
* hypercall.
*/
struct kvm_sdei_registered_event {
__u32 num
#define KVM_SDEI_EVENT_STATE_REGISTERED (1 << 0)
#define KVM_SDEI_EVENT_STATE_ENABLE (1 << 1)
#define KVM_SDEI_EVENT_STATE_UNREGISTER_PENDING (1 << 2)
__u8 state;
__u8 route_mode;
__u8 padding[2];
__u64 route_affinity;
__u64 ep_address;
__u64 ep_arg;
__u64 notifier;
}
+struct kvm_sdei_exposed_event_state {
+ __u64 num;
+
+ __u8 type;
+ __u8 signaled;
+ __u8 priority;
+ __u8 padding[5];
+ __u64 notifier;
Wait, isn't this a kernel function pointer!?
Yeah, it is a kernel function pointer, used by Async PF to know if
the corresponding event has been handled or not. Async PF can cancel
the previously injected event for performance concerns. Either Async PF
or SDEI needs to migrate it. To keep SDEI transparent enough to Async PF,
SDEI is responsible for its migration.
+};
+
+struct kvm_sdei_registered_event_state {
You should fold these fields together with kvm_sdei_exposed_event_state
into a single 'kvm_sdei_event' structure:
@route_mode and @route_affinity can't be configured or modified until
the event is registered. Besides, they're only valid to the shared
events. For private events, they don't have the routing needs. It means
those two fields would be part of struct kvm_sdei_registered_event instead
of kvm_sdei_exposed_event.
+ __u64 num;
+
+ __u8 route_mode;
+ __u8 padding[3];
+ __u64 route_affinity;
And these shouldn't be UAPI at the VM scope. Each of these properties
could be accessed via a synthetic/'pseudo-firmware' register on a vCPU FD:
They're accessed through vcpu or kvm fd depending on what type the event
is. For the VM-owned shared event, they're accessed through KVM fd. For the
vcpu-owned private event, they're accessed through vcpu fd.
I'm not sure if I catch the idea to have a synthetic register and I'm to
confirm. If I'm correct, you're talking about the "IMPLEMENTATION DEFINED"
system register, whose OP0 and CRn are 0B11 and 0B1x11. If two implementation
defined registers can be adopted, I don't think we need to expose anything
through ABI. All the operations and the needed data can be passed through
the system registers.
SYS_REG_SDEI_COMMAND
Receives commands like to expose event, register event and change
vcpu state etc.
SYS_REG_SDEI_DATA
The needed data corresponding to the received command.
However, I'm not positive that synthetic register can be used here. When
Mark Rutland review "PATCH[RFC v1] Async PF support", the implementation
defined registers can't be used in a very limited way. That time, a set
of implementation defined registers are defined to identify the asynchronous
page faults and access to the control data block. However, the idea was
rejected. Later on, Marc recommended SDEI for Async PF.
https://www.spinics.net/lists/kvm-arm/msg40315.html
+ __u64 ep_address[KVM_SDEI_MAX_VCPUS];
+ __u64 ep_arg[KVM_SDEI_MAX_VCPUS];
+ __u64 registered[KVM_SDEI_MAX_VCPUS/64];
+ __u64 enabled[KVM_SDEI_MAX_VCPUS/64];
+ __u64 unregister_pending[KVM_SDEI_MAX_VCPUS/64];
+};
+
+struct kvm_sdei_vcpu_event_state {
+ __u64 num;
+
+ __u32 event_count;
+ __u32 padding;
+};
+
+struct kvm_sdei_vcpu_regs_state {
+ __u64 regs[18];
+ __u64 pc;
+ __u64 pstate;
+};
+
+struct kvm_sdei_vcpu_state {
Same goes here, I strongly recommend you try to expose this through the
KVM_{GET,SET}_ONE_REG interface if at all possible since it
significantly reduces the UAPI burden, both on KVM to maintain it and
VMMs to actually use it.
Yeah, it's much convenient to use the implementation defined register here.
However, I'm not positive if we can do this. Please see the details I
provided above :)
Thanks,
Gavin
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm