On Mon, Sep 6, 2021 at 4:12 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > Hi Oliver, > > On Thu, 19 Aug 2021 23:36:34 +0100, > Oliver Upton <oupton@xxxxxxxxxx> wrote: > > > > Certain VMMs/operators may wish to give their guests the ability to > > initiate a system suspend that could result in the VM being saved to > > persistent storage to be resumed at a later time. The PSCI v1.0 > > specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to > > request a system suspend. This call is optional for v1.0, and KVM > > elected to not support the call in its v1.0 implementation. > > > > This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64. > > Since this is a system-scoped event, KVM cannot quiesce the VM on its > > own. We add a new system exit type in this series to clue in userspace > > that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM > > that doesn't care about this event can simply resume the guest without > > issue (we set up the calling vCPU to come out of reset correctly on next > > KVM_RUN). > > More idle thoughts on this: > > Although the definition of SYSTEM_SUSPEND is very simple from a PSCI > perspective, I don't think it is that simple at the system level, > because PSCI is only concerned with the CPU. > > For example, what is a wake-up event? My first approach would be to > consider interrupts to be such events. However, this approach suffers > from at least two issues: > > - How do you define which interrupts are actual wake-up events? > Nothing in the GIC architecture defines what a wake-up is (let alone > a wake-up event). Good point. One possible implementation of suspend could just be a `WFI` in a higher EL. In this case, KVM could emulate WFI wake up events according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely clear what constitutes a wakeup from powered down state. > - Assuming you have a way to express the above, how do you handle > wake-ups from interrupts that have their source in the kernel (such > as timers, irqfd sources)? I think this could be handled, so long as we allow userspace to indicate it has woken a vCPU. Depending on this, in the next KVM_RUN we'd say: - Some IMP DEF event occurred; I'm waking this CPU now - I've either chosen to ignore the guest or will defer to KVM's suspend implementation > How do you cope with directly injected interrupts? No expert on this, I'll need to do a bit more reading to give a good answer here. > It looks to me that your implementation can only work with userspace > provided events, which is pretty limited. Right. I implemented this from the mindset that userspace may do something heavyweight when a guest suspends, like save it to a persistent store to resume later on. No matter what we do in KVM, I think it's probably best to give userspace the right of first refusal to handle the suspension. > Other items worth considering: ongoing DMA, state of the caches at > suspend time, device state in general All of this really needs to be > defined before we can move forward with this feature. I believe it is largely up to the caller to get devices in a quiesced state appropriate for a system suspend, but PSCI is delightfully vague on this topic. On the contrary, it is up to KVM's implementation to guarantee caches are clean when servicing the guest request. I'll crank on this a bit more and see if I have more thoughts. -- Thanks, Oliver