Hi Oliver, Please Cc the KVM/arm64 reviewers (now added). Also, please consider subscribing to the kvmarm mailing list so that I don't have to manually approve your posts ;-). On Tue, 08 Jun 2021 22:47:34 +0100, Oliver Upton <oupton@xxxxxxxxxx> wrote: > > ARMv8 provides for a virtual counter-timer offset that is added to guest > views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not > provided userspace with any perception of this, and instead affords a > value-based scheme of migrating the virtual counter-timer by directly > reading/writing the guest's CNTVCT_EL0. This is problematic because > counters continue to elapse while the register is being written, meaning > it is possible for drift to sneak in to the guest's time scale. This is > exacerbated by the fact that KVM will calculate an appropriate > CNTVOFF_EL2 every time the register is written, which will be broadcast > to all virtual CPUs. The only possible way to avoid causing guest time > to drift is to restore counter-timers by offset. Well, the current method has one huge advantage: time can never go backward from the guest PoV if you restore what you have saved. Yes, time can elapse, but you don't even need to migrate to observe that. > > Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls > to migrate the value of CNTVOFF_EL2. These ioctls yield precise control > of the virtual counter-timers to userspace, allowing it to define its > own heuristics for managing vCPU offsets. I'm not really in favour of inventing a completely new API, for multiple reasons: - CNTVOFF is an EL2 concept. I'd rather not expose it as such as it becomes really confusing with NV (which does expose its own CNTVOFF via the ONE_REG interface) - You seem to allow each vcpu to get its own offset. I don't think that's right. The architecture defines that all PEs have the same view of the counters, and an EL1 guest should be given that illusion. - by having a parallel save/restore interface, you make it harder to reason about what happens with concurrent calls to both interfaces - the userspace API is already horribly bloated, and I'm not overly keen on adding more if we can avoid it. I'd rather you extend the current ONE_REG interface and make it modal, either allowing the restore of an absolute value or an offset for CNTVCT_EL0. This would also keep a consistent behaviour when restoring vcpus. The same logic would apply to the physical offset. As for how to make it modal, we have plenty of bits left in the ONE_REG encoding. Pick one, and make that a "relative" attribute. This will result in some minor surgery in the get/set code paths, but at least no entirely new mechanism. One question though: how do you plan to reliably compute the offset? As far as I can see, it is subject to the same issues you described above (while the guest is being restored, time flies), and you have the added risk of exposing a counter going backward from a guest perspective. Thanks, M. -- Without deviation from the norm, progress is not possible.