Re: [PATCH v3 08/18] arm/arm64: KVM: Add PSCI version selection API

Andrew Jones <drjones@xxxxxxxxxx> · Mon, 5 Feb 2018 10:58:23 +0100

On Mon, Feb 05, 2018 at 09:24:33AM +0000, Marc Zyngier wrote:
> On 04/02/18 12:37, Christoffer Dall wrote:
> > On Sat, Feb 03, 2018 at 11:59:32AM +0000, Marc Zyngier wrote:
> >> On Fri, 2 Feb 2018 21:17:06 +0100
> >> Andrew Jones <drjones@xxxxxxxxxx> wrote:
> >>
> >>> On Thu, Feb 01, 2018 at 11:46:47AM +0000, Marc Zyngier wrote:
> >>>> Although we've implemented PSCI 1.0 and 1.1, nothing can select them
> >>>> Since all the new PSCI versions are backward compatible, we decide to
> >>>> default to the latest version of the PSCI implementation. This is no
> >>>> different from doing a firmware upgrade on KVM.
> >>>>
> >>>> But in order to give a chance to hypothetical badly implemented guests
> >>>> that would have a fit by discovering something other than PSCI 0.2,
> >>>> let's provide a new API that allows userspace to pick one particular
> >>>> version of the API.
> >>>>
> >>>> This is implemented as a new class of "firmware" registers, where
> >>>> we expose the PSCI version. This allows the PSCI version to be
> >>>> save/restored as part of a guest migration, and also set to
> >>>> any supported version if the guest requires it.
> >>>>
> >>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> >>>> ---
> >>>>  Documentation/virtual/kvm/api.txt      |  3 +-
> >>>>  Documentation/virtual/kvm/arm/psci.txt | 30 +++++++++++++++
> >>>>  arch/arm/include/asm/kvm_host.h        |  3 ++
> >>>>  arch/arm/include/uapi/asm/kvm.h        |  6 +++
> >>>>  arch/arm/kvm/guest.c                   | 13 +++++++
> >>>>  arch/arm64/include/asm/kvm_host.h      |  3 ++
> >>>>  arch/arm64/include/uapi/asm/kvm.h      |  6 +++
> >>>>  arch/arm64/kvm/guest.c                 | 14 ++++++-
> >>>>  include/kvm/arm_psci.h                 |  9 +++++
> >>>>  virt/kvm/arm/psci.c                    | 68 +++++++++++++++++++++++++++++++++-
> >>>>  10 files changed, 151 insertions(+), 4 deletions(-)
> >>>>  create mode 100644 Documentation/virtual/kvm/arm/psci.txt
> >>>>
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index 57d3ee9e4bde..334905202141 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -2493,7 +2493,8 @@ Possible features:
> >>>>  	  and execute guest code when KVM_RUN is called.
> >>>>  	- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
> >>>>  	  Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
> >>>> -	- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
> >>>> +	- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
> >>>> +          backward compatible with v0.2) for the CPU.
> >>>>  	  Depends on KVM_CAP_ARM_PSCI_0_2.
> >>>>  	- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
> >>>>  	  Depends on KVM_CAP_ARM_PMU_V3.
> >>>> diff --git a/Documentation/virtual/kvm/arm/psci.txt b/Documentation/virtual/kvm/arm/psci.txt
> >>>> new file mode 100644
> >>>> index 000000000000..aafdab887b04
> >>>> --- /dev/null
> >>>> +++ b/Documentation/virtual/kvm/arm/psci.txt
> >>>> @@ -0,0 +1,30 @@
> >>>> +KVM implements the PSCI (Power State Coordination Interface)
> >>>> +specification in order to provide services such as CPU on/off, reset
> >>>> +and power-off to the guest.
> >>>> +
> >>>> +The PSCI specification is regularly updated to provide new features,
> >>>> +and KVM implements these updates if they make sense from a virtualization
> >>>> +point of view.
> >>>> +
> >>>> +This means that a guest booted on two different versions of KVM can
> >>>> +observe two different "firmware" revisions. This could cause issues if
> >>>> +a given guest is tied to a particular PSCI revision (unlikely), or if
> >>>> +a migration causes a different PSCI version to be exposed out of the
> >>>> +blue to an unsuspecting guest.
> >>>> +
> >>>> +In order to remedy this situation, KVM exposes a set of "firmware
> >>>> +pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
> >>>> +interface. These registers can be saved/restored by userspace, and set
> >>>> +to a convenient value if required.
> >>>> +
> >>>> +The following register is defined:
> >>>> +
> >>>> +* KVM_REG_ARM_PSCI_VERSION:
> >>>> +
> >>>> +  - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
> >>>> +    (and thus has already been initialized)
> >>>> +  - Returns the current PSCI version on GET_ONE_REG (defaulting to the
> >>>> +    highest PSCI version implemented by KVM and compatible with v0.2)
> >>>> +  - Allows any PSCI version implemented by KVM and compatible with
> >>>> +    v0.2 to be set with SET_ONE_REG
> >>>> +  - Affects the whole VM (even if the register view is per-vcpu)  
> >>>
> >>
> >> Hi Drew,
> >>
> >> Thanks for looking into this, and for the exhaustive data.
> >>
> >>>
> >>> I've put some more thought and experimentation into this. I think we
> >>> should change to a vcpu feature bit. The feature bit would be used to
> >>> force compat mode, v0.2, so KVM would still enable the new PSCI
> >>> version by default. Below are two tables describing why I think we
> >>> should switch to something other than a new sysreg, and below those
> >>> tables are notes as to why I think we should use a vcpu feature. The
> >>> asterisks in the tables point out behaviors that aren't what we want.
> >>> While both tables have an asterisk, the sysreg approach's issue is
> >>> bug. The vcpu feature approach's issue is risk incurred from an
> >>> unsupported migration, albeit one that is hard to detect without a
> >>> new machine type.
> >>>
> >>>  +-----------------------------------------------------------------------+
> >>>  |                          sysreg approach                              |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | migration        | userspace | works |             notes              |
> >>>  |                  |  change   |       |                                |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | new    -> new    |   NO      |  YES  | Expected                       |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | old    -> new    |   NO      |  YES  | PSCI 1.0 is backward compatible|
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | new    -> old    |   NO      |  NO   | Migration fails due to the new |
> >>>  |                  |           |       | sysreg. Migration shouldn't    |
> >>>  |                  |           |       | have been attempted, but no    |
> >>>  |                  |           |       | way to know without a new      |
> >>>  |                  |           |       | machine type.                  |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | compat -> old    |   YES     |  NO*  | Even when setting PSCI version |
> >>>  |                  |           |       | to 0.2, we add a new sysreg,   |
> >>>  |                  |           |       | so migration will still fail.  |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | old    -> compat |   YES     |  YES  | It's OK for the destination to |
> >>>  |                  |           |       | support more sysregs than the  |
> >>>  |                  |           |       | source sends.                  |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  
> >>>  
> >>>  +-----------------------------------------------------------------------+
> >>>  |                        vcpu feature approach                          |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | migration        | userspace | works |             notes              |
> >>>  |                  |  change   |       |                                |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | new    -> new    |   NO      |  YES  | Expected                       |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | old    -> new    |   NO      |  YES  | PSCI 1.0 is backward compatible|
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | new    -> old    |   NO      |  YES* | Migrates, but it's not safe    |
> >>>  |                  |           |       | for the guest kernel, and no   |
> >>>  |                  |           |       | way to know without a new      |
> >>>  |                  |           |       | machine type.                  |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | compat -> old    |   YES     |  YES  | Expected                       |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>  | old    -> compat |   YES     |  YES  | Expected                       |
> >>>  +------------------+-----------+-------+--------------------------------+
> >>>
> >>>
> >>> Notes as to why the vcpu feature approach was selected:
> >>>
> >>> 1) While this is VM state, and thus a VM control would be a more natural
> >>>    fit, a new vcpu feature bit would be much less new code. We also
> >>>    already have a PSCI vcpu feature bit, so a new one will actually fit
> >>>    quite well.
> >>>
> >>> 2) No new state needs to be migrated, as we already migrate the feature
> >>>    bitmap. Unlike, KVM, QEMU doesn't track the max number of features,
> >>>    so bumping it one more in KVM doesn't require a QEMU change.
> >>>
> >>>
> >>> If we switch to a vcpu feature bit, then I think this patch can be
> >>> replaced with something like this
> >>
> >> A couple of remarks:
> >>
> >> - My worry with this feature bit  is that it is a point fix, and it
> >>   doesn't scale. Come PSCI 1.2 and WORKAROUND_2, what do we do? Add
> >>   another feature bit that says "force to 1.0"? I'd really like
> >>   something we can live with in the long run, and "KVM as firmware"
> >>   needs to be able to evolve without requiring a new userspace
> >>   interface each time we rev it.
> >>
> >> - The "compat->old" entry in your sysreg table is not quite fair. In
> >>   the feature table, you teach userspace about the new feature bit. You
> >>   could just as well teach userspace about the new sysreg. Yes, things
> >>   may be different in QEMU, but that's not what we're talking about
> >>   here.
> >>
> >> - Allowing a guest to migrate in an unsafe way seems worse than failing
> >>   a migration unexpectedly. Or at least not any better.
> >>
> >> To be clear: I'm not dismissing the idea at all, but I want to make sure
> >> we're not cornering ourselves into an uncomfortable place.
> >>
> >> Christoffer, Peter, what are your thoughts on this?
> >>
> > 
> > Taking a step back, the only reasons why this patch isn't simply
> > enabling PSCI v1.0 by default (without any selection method) are that we
> > (1) want to support guests that complain about PSCI_VERSION != 0.2
> > (which isn't completely outside the realm of a reasonable implementation
> > if you read the description of PSCI_VERSION in the 0.2 spec) and (2) to
> > provide migration support for guests that call
> > PSCI_1_0_FN_PSCI_FEATURES.
> > 
> > If we ignore (1) because we don't know of any guests where this is an
> > issue, then it's all about (2), migration from "new -> old".
> > 
> > As far as I can tell, the use case we are worried about here is updating
> > the kernel (and not QEMU) on half of your data center and then trying to
> > migrate from the upgraded kernel machine to a legacy (and potentially
> > variant 2 vulnerable) machine.  For this specific move from PSCI 0.2 to
> > 1.0 with the included mitigation, I don't really think this is an
> > important use case to support.
> 
> I'm not so sure. Promising mitigation to a guest, and then seeing that
> mitigation being silently taken away because we've allowed it to migrate
> seem bad to me.
> 
> > In terms of the more general approach to "KVM firmware upgrades" and
> > migration, I think something like the proposed FW register interface
> > here should work, but I'm concerned about the lack of opportunity from
> > userspace to predict a migration failure.  But I don't understand why
> 
> Userspace could predict some of the failure cases, if only by checking
> that all registers can be restored in a new guest. I'm not sure how
> viable this is in a data centre type of environment.
> 
> > this requires a new machine type?  Why can't we simply provide a KVM
> > capability that libvirt etc. can query for?
> > 
> > Also, is it generally true that we can't expose any additional system
> > registers from KVM without breaking migration and we don't have any
> > method to deal with that in userspace and upper layers?  If that's true,
> 
> It is my understanding that each time we add a new sysreg to KVM,
> migration in QEMU breaks in the new->old direction.
> 
> > that's a bigger problem in general and something we should work on
> > trying to solve.  If it's not true, then there should be some method to
> > deal with the FW register already (like capabilities).
> > 
> > Given the urgency of adding mitigation towards variant 2 which is the
> > driver for this work, I think we should drop the compat functionality in
> > this series and work this out later on if needed.  I think we can just
> > tweak the previous patch to enable PSCI 1.0 by default and drop this
> > patch for the current merge window.
> 
> I'd be fine with that, as long as we have a clear agreement on the
> impact of such a move.

Yeah, that's what I was trying to figure out with my fancy tables. I might
be coming around more to your approach now, though. Ensuring the new->old
migration fails is a nice feature of this series. It would be good if
we could preserve that behavior without committing to a new userspace
interface, but I'm not sure how. Maybe I should just apologize for the
noise, and this patch be left as is...

Thanks,
drew
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm