Re: KVM/arm64: Guest ABI changes do not appear rollback-safe

Oliver Upton <oupton@xxxxxxxxxx> · Fri, 1 Oct 2021 08:38:35 -0700

On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
>
> On Thu, 30 Sep 2021 18:24:23 +0100,
> Oliver Upton <oupton@xxxxxxxxxx> wrote:
> >
> > Hey Marc,
> >
> > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > >
> > > Hi Oliver,
> > >
> > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > Oliver Upton <oupton@xxxxxxxxxx> wrote:
> > > >
> > > > I have some lingering thoughts on this subject since we last spoke and
> > > > wanted to discuss.
> > > >
> > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > hypercall identity register introduced on a newer kernel. In order to
> > > > maintain guest ABI, the VMM would need to know about that register and
> > > > zero it when restoring an older guest.
> > >
> > > Just as it would need to be able to discover any new system register
> > > exposed by default, as it happens at all times. Which is why we have a
> > > way to discover all the registers, architected or not.
> > >
> > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > default, encouraging userspace to zero these registers away for older
> > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > Doing so would align the hypercall identity registers with the feature
> > > > ID registers from the architecture.
> > >
> > > The range already exists in the form of the "coprocessor" 0x14. I
> > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > doesn't know about how this pseudo-register works, it won't be able to
> > > program it anyway.
> > >
> > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > id-register is also well defined (feature not implemented), and the
> > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > mechanism.
> > >
> > > Our firmware registers don't follow the same rules. Userspace can
> > > write to them, and there is no such "not implemented" rule (case in
> > > point, PSCI). We also have a separate enumeration mechanism
> > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > what is implemented.
> > >
> > > For these reasons, I don't immediately see the point of advertising a
> > > set of registers ahead of time, before userspace grows an
> > > understanding of what these registers mean.
> >
> > Supposing we don't preallocate some hypercall ID registers, how can we
> > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > we would preserve the hypercall feature set across the migration which
> > could work for a while with the first set of registers that get
> > defined, but whenever a new hypercall firmware register comes along
> > then the VMM will be clueless to the new ABI.
>
> My expectations were that whenever userspace writes a set of firmware
> register, all the defaults are invalidated. For example say that
> host-A know about a single hypercall register, while host-B knows
> about two. Upon writing to the first register, the host would clear
> the set of available services in the second one. If userspace
> eventually writes there, the value would stick if valid.
>
> Also, remember that pseudo-registers don't have to be 64bit. We could
> define a new class of hypercall-specific registers that would be much
> wider, and thus have a larger write granularity (KVM supports anything
> from 8 to 2048 bits). This would make it pretty easy to implement the
> above.
>
> > Fundamentally, I don't think userspace should need a patch to preserve
> > ABI on a newer kernel. Despite that, it would seem that userspace will
> > need to learn of any firmware registers that control hypercall
> > features which come after the initial set that gets proposed. If
> > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > architectural feature ID registers) from other parts of the vCPU
> > state, it would be clear to what registers to zero on a newer kernel.
> > Apologies if it is distracting to mention the feature ID registers
> > here, but both are on my mind currently and want to make sure there is
> > some consistency in how features get handled on newer kernels,
> > architected or not.
>
> The problem I see is that we will always need to grow the HC register
> space one way or another, no matter how many we reserve. Your approach
> only works if we don't exceed that particular range. Maybe it will
> never be a problem, but it remains that this is not scalable.
>
> If we wanted to be safe, we'd reserve the whole of the possible space
> as defined by the SMCCC spec. Given that we currently have two HC
> spaces (the ARM-controlled one, and the KVM-specific one), the
> function space being 16bits in each case, that's 16kB worth of zeroes
> that userspace has to save/restore at all times... I'm not overly
> enthusiastic.

Yeah, that's definitely overkill.

I agree with your position; we can solve the issue altogether by
enforcing ordering on the hypercall registers. Userspace should read
all hypercall registers to discover features, then write them. Given
your suggestion, I don't know if there is much need for the guesswork
to conclude on an appropriate register size for the hypercall bitmaps.
64 bits seems fine to me..

--
Thanks,
Oliver