Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Marc,

On Tue, Sep 21, 2021 at 10:45:22AM +0100, Marc Zyngier wrote:
> > > > > - How do you define which interrupts are actual wake-up events?
> > > > >   Nothing in the GIC architecture defines what a wake-up is (let alone
> > > > >   a wake-up event).
> > > >
> > > > Good point.
> > > >
> > > > One possible implementation of suspend could just be a `WFI` in a
> > > > higher EL. In this case, KVM could emulate WFI wake up events
> > > > according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely
> > > > clear what constitutes a wakeup from powered down state.
> > >
> > > It isn't, and it is actually IMPDEF (there isn't much in the ARM ARM
> > > in terms of what constitutes a low power state). And even if you
> > > wanted to emulate a WFI in userspace, the problem of interrupts that
> > > have their source in the kernel remains. How to you tell userspace
> > > that such an event has occurred if the vcpu thread isn't in the
> > > kernel?
> > 
> > Well, are there any objections to saying for the KVM implementation we
> > observe the WFI wake-up events per the cited section of the ARM ARM?
> 
> These are fine. However, what of the GIC, for example? Can any GIC
> interrupt wake-up the guest? I'm happy to say "yes" to this, but I
> suspect others will have a different idea, and the thought of
> introducing an IMPDEF wake-up interrupt controller doesn't fill me
> with joy.
>

I'm planning to propose exactly this in the next series; any GIC
interrupt will wake the guest. I'd argue that if someone wants to do
anything else, their window of opportunity is the exit to userspace.

[...]

> > Just to check understanding for v2:
> > 
> > We agree that an exit to userspace is fine so it has the opportunity
> > to do something crazy when the guest attempts a suspend. If a VMM does
> > nothing and immediately re-enters the kernel, we emulate the suspend
> > there by waiting for some event to fire, which for our purposes we
> > will say is an interrupt originating from userspace or the kernel
> > (WFI). In all, the SUSPEND exit type does not indicate that emulation
> > terminates with the VMM. It only indicates we are about to block in
> > the kernel.
> > 
> > If there is some IMPDEF event specific to the VMM, it should signal
> > the vCPU thread to kick it out of the kernel, make it runnable, and
> > re-enter. No need to do anything special from the kernel perspective
> > for this. This is only for the case where we decide to block in the
> > kernel.
> 
> This looks sensible. One question though: I think there is an implicit
> requirement that the guest should be "migratable" in that state. How
> does the above handles it? If the "suspend state" is solely held in
> the kernel, we need to be able to snapshot it, and I don't like the
> sound of that...
> 
> We could instead keep the "suspend state" in the VMM:
> 
> On PSCI_SUSPEND, the guest exits to userspace. If the VMM wants to
> honour the supend request, it reenters the guest with RUN+SUSPEND,
> which results in a WFI. On each wake-up, the guest exits to userspace,
> and it is the VMM responsibility to either perform the wake-up (RUN)
> or stay in suspend (RUN+SUSPEND).
> 
> This ensures that the guest never transitions out of suspend without
> the VMM knowing, and the VMM can always force a resume by kicking the
> thread back to userspace.
> 
> Thoughts?

Agreed. I was mulling on exactly how to clue in the VMM about the
suspend state. What if we just encode it in KVM_{GET,SET}_MP_STATE? We'd
avoid the need for new UAPI that way. We could introduce a new state,
KVM_MP_STATE_SUSPENDED, which would clue KVM to do the suspend as we've
discussed. We would exit to userspace with KVM_MP_STATE_RUNNABLE,
meaning the VMM would need to set the MP state explicitly for the
in-kernel suspend to work.

[...]

> > > > On the contrary, it is up to KVM's implementation to
> > > > guarantee caches are clean when servicing the guest request.
> > >
> > > This last point is pretty unclear to me. If the guest doesn't clean to
> > > the PoC (or even to one of the PoPs) when it calls into suspend,
> > > that's a clear indication that it doesn't care about its data. Why
> > > should KVM be more conservative here? It shouldn't be in the business
> > > of working around guest bugs.
> > 
> > PSCI is vague on this, sadly. DEN0022D.b, 5.4.8 "Implementation
> > responsibilities: Cache and coherency management states" that for
> > CPU_SUSPEND, the PSCI implementation must perform a cache clean
> > operation before entering the powerdown state. I don't see any reason
> > why SYSTEM_SUSPEND should be excluded from this requirement.
> 
> I'm not sure that's the case. CPU_SUSPEND may not use the resume
> entry-point if the suspend results is a shallower state than expected
> (i.e. the call just returns instead of behaving like a CPU boot).
> 
> However, a successful SYSTEM_SUSPEND always results in the deepest
> possible state. The guest should know that. There is also the fact
> that performing a full clean to the PoC is going to be pretty
> expensive, and I'd like to avoid that.
> 
> I'll try and reach out to some of the ARM folks for clarification on
> the matter.

That'd be very helpful!

--
Thanks,
Oliver



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux