Re: [PATCH v6 03/18] KVM: arm64: Add SDEI virtualization infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Oliver,

On 4/24/22 2:43 AM, Oliver Upton wrote:
On Sat, Apr 23, 2022 at 10:18:49PM +0800, Gavin Shan wrote:
On 4/23/22 5:48 AM, Oliver Upton wrote:
On Sun, Apr 03, 2022 at 11:38:56PM +0800, Gavin Shan wrote:
Software Delegated Exception Interface (SDEI) provides a mechanism
for registering and servicing system events, as defined by ARM DEN0054C
specification. One of these events will be used by Asynchronous Page
Fault (Async PF) to deliver notifications from host to guest.

The events are classified into shared and private ones according to
their scopes. The shared events are system or VM scoped, but the
private events are CPU or VCPU scoped. The shared events can be
registered, enabled, unregistered and reset through hypercalls
issued from any VCPU. However, the private events are registered,
enabled, unregistered and reset on the calling VCPU through
hypercalls. Besides, the events are also classified into critical
and normal events according their priority. During event delivery
and handling, the normal event can be preempted by another critical
event, but not in reverse way. The critical event is never preempted
by another normal event.

We don't have any need for critical events though, right? We should avoid
building out the plumbing around the concept of critical events until
there is an actual use case for it.


The Async PF one is critical event, as guest needs to handle it immediately.

But that's the sticking point for me. IIUC, we're going to deliver an
aync PF SDEI event to the PE that is waiting on a page so it can go do
something else and wait for the page to come in. Normal events preempt
~everything, critical events preempt even normal events.

How can the guest context switch and do something better at an arbitrary
instruction boundary (such as in an SDEI handler of normal priority)? If
a guest takes a page fault in that context, it may as well wait
synchronously for the page to come in.

And in the case of the page ready event, we still need to clean up shop
before switching to the unblocked context.


The Async PF is working exactly like what you said. The normal event handler
should be executed in EL1. When the vcpu runs in EL1, no Async PF event will
be triggered. So the page fault in the normal event context is always resolved
synchronously. The page ready notification is delivered by PPI instead of
SDEI event, but yes, we need cleanup before switching to the previously
suspended context.

Otherwise, it's possible that guest can't continue its execution. Besides,
the software signaled event (0x0) is normal event. They're the only two
events to be supported, I assume the software signaled event (0x0) is only
used selftest/kvm. So Async PF one becomes the only event and it can be
in normal priority until other SDEI event needs to be added and supported.

I believe there are multiple use cases for guest-initiated SDEI events
beyond just testing. Poking a hung PE but one example.


Right. I think we can drop support to the critical event. Lets do this
in next respin. Prior to working on next revision, I still want to
confirm with you on the data structures. Please refer the below reply
about the adjusted data structures.

However, the logic to support critical/normal events has been here. So
I think it's probably nice to keep it. At least, it make it easier to
add a new SDEI event in future. We dropped the support for the shared
event from v5 to v6, I think we probably never need a shared event for
ever :)

But then we're sprinkling a lot of dead code throughout KVM, right? It
makes KVM's job even easier if it doesn't have to worry about nesting
SDEI events.
+struct kvm_sdei_exposed_event {
+	unsigned int	num;
+	unsigned char	type;
+	unsigned char	signaled;

what is this used for?


It indicates the event can be raised by software or not. For those
events exposed by KVM should be raised by software, so this should
always be true.

Isn't there always going to be some piece of software that raises an
event?

For KVM, we have guest-initiated 'software-signaled' events and KVM-initiated
async PF (whatever else may follow as well).


Yes, The assumption that all events are always singled by software should
be true. So this field (@signaled) can be dropped either. So I plan to
change the data structures like below, according to the suggestions given
by you. Please double check if there are anything missed.

(1) Those fields of struct kvm_sdei_exposed_event are dropped or merged
    to struct kvm_sdei_event.

    struct kvm_sdei_event {
           unsigned int          num;
           unsigned long         ep_addr;
           unsigned long         ep_arg;
#define KVM_SDEI_EVENT_STATE_REGISTERED         0
#define KVM_SDEI_EVENT_STATE_ENABLED            1
#define KVM_SDEI_EVENT_STATE_UNREGISTER_PENDING 2
           unsigned long         state;                 /* accessed by {test,set,clear}_bit() */
           unsigned long         event_count;
    };

(2) In arch/arm64/kvm/sdei.c

    static kvm_sdei_event exposed_events[] = {
           { .num = SDEI_SW_SIGNALED_EVENT },
    };

(3) In arch/arm64/kvm/sdei.c::kvm_sdei_create_vcpu(), the SDEI events
    are instantiated based on @exposed_events[]. It's just what we're
    doing and nothing is changed.

Do we need this if we disallow nesting events?


Yes, we need this. "event == NULL" is used as indication of invalid
context. @event is the associated SDEI event when the context is
valid.

What if we use some other plumbing to indicate the state of the vCPU? MP
state comes to mind, for example.


Even the indication is done by another state, kvm_sdei_vcpu_context still
need to be linked (associated) with the event. After the vCPU context becomes
valid after the event is delivered, we still need to know the associated
event when some of hypercalls are triggered. SDEI_1_0_FN_SDEI_EVENT_COMPLETE
is one of the examples, we need to decrease struct kvm_sdei_event::event_count
for the hypercall.

There are several options I can figure out for now. Please let me know your
preference.

(1) Rename 'struct kvm_sdei_event *event' to 'unsigned int num'. We use
    the check of '@num == KVM_SDEI_INVALID_NUM' as the indication.
    KVM_SDEI_INVALID_NUM is defined as -1U. With the change, no pointer
    is left in kvm_sdei_vcpu_context.

(2) Add field of 'struct kvm_sdei_event *current_event' to kvm_sdei_vcpu,
    to associate the event with the vCPU context. We still use the
    check of '@current_event == NULL' as the indication.

(3) Have changes in (1) and (2) in the mean while.

Besides, the data structures needs somewhat adjustment as you suggested
previously. The major changes are to drop @lock and critical event support.

   struct kvm_sdei_vcpu_context {
        struct kvm_sdei_event   *event;                      /* need your confirm */
        unsigned long           regs[18];
        unsigned long           pc;
        unsigned long           pstate;
   };

   struct kvm_sdei_vcpu {
        struct kvm_sdei_event           *events;            /* instantiated from @exposed_events[] */
        unsigned char                   masked;             /* vCPU is masked off or not           */
        unsigned long                   event_count;        /* Total count of pending events       */
        struct kvm_sdei_vcpu_context    context;            /* vCPU context for SDEI event         */
   };

+/*
+ * According to SDEI specification (v1.1), the event number spans 32-bits
+ * and the lower 24-bits are used as the (real) event number. I don't
+ * think we can use that much event numbers in one system. So we reserve
+ * two bits from the 24-bits real event number, to indicate its types:
+ * physical or virtual event. One reserved bit is enough for now, but
+ * two bits are reserved for possible extension in future.
+ *
+ * The physical events are owned by firmware while the virtual events
+ * are used by VMM and KVM.

Doesn't KVM own everything? I don't see how the guest could interact
with another SDEI implementation.


I might be overthinking on the scheme. The host's firmware might have
SDEI supported and we want to propogate these events originated from
host's firmware to guest. In this case, we need to distinguish the events
originated from host's firmware and kvm (guest's firmware). Even this
case isn't possible to happen, I think it's still nice to distinguish
the events originated from a real firmware or KVM emulated firmware.

The guest ABI w.r.t. SDEI is under full ownership of KVM. Any other
implementations events will never get exposed to the guest.

Couldn't the guest own the host if it was talking to our firmware
anyway?


Right. Lets drop these macros and kvm_sdei_is_virtual() in next respin.
As you suggested, we need to iterate struct kvm_sdei_vcpu::events to
see if the event (number) is valid or not.

+ */
+#define KVM_SDEI_EVENT_NUM_TYPE_SHIFT	22
+#define KVM_SDEI_EVENT_NUM_TYPE_MASK	(3 << KVM_SDEI_EVENT_NUM_TYPE_SHIFT)
+#define KVM_SDEI_EVENT_NUM_TYPE_PHYS	0
+#define KVM_SDEI_EVENT_NUM_TYPE_VIRT	1
+
+static inline bool kvm_sdei_is_virtual(unsigned int num)
+{
+	unsigned int type;
+
+	type = (num & KVM_SDEI_EVENT_NUM_TYPE_MASK) >>
+	       KVM_SDEI_EVENT_NUM_TYPE_SHIFT;
+	if (type == KVM_SDEI_EVENT_NUM_TYPE_VIRT)
+		return true;
+
+	return false;
+}
+
+static inline bool kvm_sdei_is_virtual(unsigned int num)
+{
+	return num == SDEI_SW_SIGNALED_EVENT;
+}

Couldn't the caller just check the event number on their own?


It would be hard because the caller can be guest. Generally, the
event and its associated information/state are accessed by hypercalls,
event injection and delivery, migration to be supported in future.
So I think it's good to check the event number by ourselves.

What I'm saying is, can't the caller of kvm_sdei_is_sw_signaled() just
do the comparison?


The only caller of kvm_sdei_is_sw_signaled() is hypercall_signal(). So
lets drop kvm_sdei_is_sw_signaled() and do the comparison in hypercall_signal()
in next respin.


Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux