On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > On Thu, 30 Sep 2021 18:24:23 +0100, > Oliver Upton <oupton@xxxxxxxxxx> wrote: > > > > Hey Marc, > > > > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > > > Hi Oliver, > > > > > > On Wed, 29 Sep 2021 19:22:05 +0100, > > > Oliver Upton <oupton@xxxxxxxxxx> wrote: > > > > > > > > I have some lingering thoughts on this subject since we last spoke and > > > > wanted to discuss. > > > > > > > > I'm having a hard time figuring out how a VMM should handle a new > > > > hypercall identity register introduced on a newer kernel. In order to > > > > maintain guest ABI, the VMM would need to know about that register and > > > > zero it when restoring an older guest. > > > > > > Just as it would need to be able to discover any new system register > > > exposed by default, as it happens at all times. Which is why we have a > > > way to discover all the registers, architected or not. > > > > > > > Perhaps instead we could reserve a range of firmware registers as the > > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by > > > > default, encouraging userspace to zero these registers away for older > > > > VMs but still allowing an old userspace to pick up new KVM features. > > > > Doing so would align the hypercall identity registers with the feature > > > > ID registers from the architecture. > > > > > > The range already exists in the form of the "coprocessor" 0x14. I > > > don't see the need to expose it as RAZ/WI, however. If userspace > > > doesn't know about how this pseudo-register works, it won't be able to > > > program it anyway. > > > > > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by > > > default so that they don't UNDEF at runtime. The meaning of a RAZ > > > id-register is also well defined (feature not implemented), and the > > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration > > > mechanism. > > > > > > Our firmware registers don't follow the same rules. Userspace can > > > write to them, and there is no such "not implemented" rule (case in > > > point, PSCI). We also have a separate enumeration mechanism > > > (GET_ONE_REG), which is (more or less) designed for userspace to find > > > what is implemented. > > > > > > For these reasons, I don't immediately see the point of advertising a > > > set of registers ahead of time, before userspace grows an > > > understanding of what these registers mean. > > > > Supposing we don't preallocate some hypercall ID registers, how can we > > safely migrate a guest from an older kernel to newer kernel? Ideally, > > we would preserve the hypercall feature set across the migration which > > could work for a while with the first set of registers that get > > defined, but whenever a new hypercall firmware register comes along > > then the VMM will be clueless to the new ABI. > > My expectations were that whenever userspace writes a set of firmware > register, all the defaults are invalidated. For example say that > host-A know about a single hypercall register, while host-B knows > about two. Upon writing to the first register, the host would clear > the set of available services in the second one. If userspace > eventually writes there, the value would stick if valid. > > Also, remember that pseudo-registers don't have to be 64bit. We could > define a new class of hypercall-specific registers that would be much > wider, and thus have a larger write granularity (KVM supports anything > from 8 to 2048 bits). This would make it pretty easy to implement the > above. > > > Fundamentally, I don't think userspace should need a patch to preserve > > ABI on a newer kernel. Despite that, it would seem that userspace will > > need to learn of any firmware registers that control hypercall > > features which come after the initial set that gets proposed. If > > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall, > > architectural feature ID registers) from other parts of the vCPU > > state, it would be clear to what registers to zero on a newer kernel. > > Apologies if it is distracting to mention the feature ID registers > > here, but both are on my mind currently and want to make sure there is > > some consistency in how features get handled on newer kernels, > > architected or not. > > The problem I see is that we will always need to grow the HC register > space one way or another, no matter how many we reserve. Your approach > only works if we don't exceed that particular range. Maybe it will > never be a problem, but it remains that this is not scalable. > > If we wanted to be safe, we'd reserve the whole of the possible space > as defined by the SMCCC spec. Given that we currently have two HC > spaces (the ARM-controlled one, and the KVM-specific one), the > function space being 16bits in each case, that's 16kB worth of zeroes > that userspace has to save/restore at all times... I'm not overly > enthusiastic. Yeah, that's definitely overkill. I agree with your position; we can solve the issue altogether by enforcing ordering on the hypercall registers. Userspace should read all hypercall registers to discover features, then write them. Given your suggestion, I don't know if there is much need for the guesswork to conclude on an appropriate register size for the hypercall bitmaps. 64 bits seems fine to me.. -- Thanks, Oliver