On Wed, Nov 28, 2018 at 02:45:16PM +0000, Steven Price wrote: > We introduce a paravirtualization interface for KVM/arm64 based on the > "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A. > > User space can specify a reserved area of memory for the guest and > inform KVM to populate the memory with information on stolen time and > Live Physical Time (LPT) that can be used to derive a stable > counter/timer for a guest subject to migration between hosts with > different counter frequencies. > > A hypercall interface is provided for the guest to interrogate the > hypervisor's support for this interface and the location of the shared > memory structures. > > Signed-off-by: Steven Price <steven.price@xxxxxxx> > --- > Documentation/virtual/kvm/arm/pvtime.txt | 169 +++++++++++++++++++++++ > 1 file changed, 169 insertions(+) > create mode 100644 Documentation/virtual/kvm/arm/pvtime.txt > > diff --git a/Documentation/virtual/kvm/arm/pvtime.txt b/Documentation/virtual/kvm/arm/pvtime.txt > new file mode 100644 > index 000000000000..1870b904075b > --- /dev/null > +++ b/Documentation/virtual/kvm/arm/pvtime.txt > @@ -0,0 +1,169 @@ > +Paravirtualized time support for arm64 > +====================================== > + > +Arm specification DEN0057/A defined a standard for paravirtualised time > +support for Aarch64 guests: > + > +https://developer.arm.com/docs/den0057/a > + > +KVM/Arm64 implements this specification by providing some hypervisor service > +calls to support a paravirtualized guest obtaining a view of the amount of > +time stolen from its execution and a concept of Live Physical Time (LPT) which > +represents time during which the guest is running and works across migrations. > + > +Three new SMCCC compatible hypercalls are defined: > + > +PV_FEATURES 0xC5000020 > +PV_TIME_LPT 0xC5000021 > +PV_TIME_ST 0xC5000022 > + > +These are only available in the SMC64/HVC64 calling convention as > +paravirtualized time is not available to 32 bit Arm guests. > + > +PV_FEATURES > + Function ID: (uint32) : 0xC5000020 > + PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST > + Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant > + PV-time feature is supported by the hypervisor. What happens when a new guest tries this hypercall on an old host? Why not return a bitmap of all supported features so the guest doesn't have to do multiple hypercalls to build its bitmap? > + > +PV_TIME_LPT > + Function ID: (uint32) : 0xC5000021 > + Flags: (uint32) : Bit[0]: Request migration interrupts > + (not currently supported by KVM) > + Return value: (int64) : IPA of the shared live physical time data > + structure or negative error code on failure: > + NOT_SUPPORTED (-1) > + INVALID_PARAMETERS (-2) > + > +PV_TIME_ST > + Function ID: (uint32) : 0xC5000022 > + Return value: (int64) : IPA of the stolen time data structure for this > + (V)CPU. On failure: > + NOT_SUPPORTED (-1) > + > +Live Physical Time > +------------------ > + > +The structure pointed to by the PV_TIME_LPT hypercall is as follows: > + > + Field | Byte Length | Byte Offset | Description > + --------------- | ----------- | ----------- | ------------------------------- > + Revision | 4 | 0 | Must be 0 for this revision > + Attributes | 4 | 4 | Must be 0 > + sequence_number | 8 | 8 | Bit 0: reserved > + | | | Bits 1:63 number of migrations > + scale_mult | 8 | 16 | Multiplier to scale from native > + | | | frequency to PV frequency > + shift | 4 | 24 | Shift applied before multiplier > + Reserved | 4 | 28 | Must be 0 > + Fn | 8 | 32 | Native frequency > + Fpv | 8 | 40 | Paravirtualized frequency seen > + | | | by guest > + div_by_fpv_mult | 8 | 48 | Multiplier to implement fast > + | | | divide by Fpv Here's kvmclock's struct struct pvclock_vcpu_time_info { u32 version; u32 pad0; u64 tsc_timestamp; u64 system_time; u32 tsc_to_system_mul; s8 tsc_shift; u8 flags; u8 pad[2]; } Revision => Attributes => sequence_number => version scale_mult => tsc_to_system_mul (this is reversed, but OK) shift => tsc_shift (also reversed) Reserved => Fn => (pvclock doesn't have, but does have system_time) Fpv => div_by_fpv_mult => I haven't thought about this enough yet to be sure kvmclock's fields are sufficient, but several look close - although the 'tsc' naming isn't nice. Also, the pvclock struct could be extended by adding an 'extended' flag to 'flags', and then appending more fields. > + > +Where scale_mult is defined as 2^(64-shift) * Fpv / Fn > + > +The structure will be updated by the hypervisor whenever the guest is migrated > +to a new host. It will be present within a reserved region of the normal > +memory given to the guest. The guest should not attempt to write into this > +memory. > + > +Stolen Time > +----------- > + > +The structure pointed to by the PV_TIME_ST hypercall is as follows: > + > + Field | Byte Length | Byte Offset | Description > + ----------- | ----------- | ----------- | -------------------------- > + Revision | 4 | 0 | Must be 0 for version 0.1 > + Attributes | 4 | 4 | Must be 0 > + Stolen time | 8 | 8 | Stolen time in unsigned > + | | | nanoseconds indicating how > + | | | much time this VCPU thread > + | | | was involuntarily not > + | | | running on a physical CPU. > + > +The structure will be updated by the hypervisor periodically as time is stolen > +from the VCPU. It will be present within a reserved region of the normal > +memory given to the guest. The guest should not attempt to write into this > +memory. There is a structure by VCPU of the guest. > + > +User space interface > +==================== > + > +User space can request that KVM provide the paravirtualized time interface to > +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example: > + > + struct kvm_create_device pvtime_device = { > + .type = KVM_DEV_TYPE_ARM_PV_TIME, > + .attr = 0, > + .flags = 0, > + }; > + > + pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device); > + > +The guest IPA of the structures must be given to KVM. This is the address of > +the LPT structure and the base address of an array of stolen time structures > +(one for each VCPU). For example: > + > + struct kvm_device_attr lpt_base = { > + .group = KVM_DEV_ARM_PV_TIME_PADDR, > + .attr = KVM_DEV_ARM_PV_TIME_LPT, > + .addr = (u64)(unsigned long)&lpt_paddr > + }; > + struct kvm_device_attr st_base = { > + .group = KVM_DEV_ARM_PV_TIME_PADDR, > + .attr = KVM_DEV_ARM_PV_TIME_ST, > + .addr = (u64)(unsigned long)&st_paddr > + }; > + > + ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &lpt_base); > + ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base); > + > +The paravirtualized frequency of the guest can also be set. By default this > +will be the counter frequency of the host. However when migrating a guest from > +another host, this must be manually set to ensure that the guest sees the same > +frequency. > + > + u32 frequency; > + > + struct kvm_device_attr lpt_freq = { > + .group = KVM_DEV_ARM_PV_TIME_FREQUENCY, > + .attr = KVM_DEV_ARM_PV_TIME_LPT, > + .addr = (u64)(unsigned long)&frequency > + }; > + > + ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &lpt_freq); > + > +For migration (or save/restore) of a guest it is necessary to save the contents > +of the shared pages and later restore them. KVM_DEV_ARM_PV_TIME_STATE_SIZE > +provides the size of this data and KVM_DEV_ARM_PV_TIME_STATE allows the state > +to be read/written. The state for stolen time and LPT are accessed separately. I guess there's no harm in userspace accessing these separately, but I hope we'll be trying to put all the PV structures on one shared page for the guest. Thanks, drew > +It is also necessary for the physical address and frequency to be set > +identically when restoring. The kernel will update the structure on first run > +of the vCPU(s) to contain the new coefficients. > + > + void *save_state(int fd, u64 attr, u32 *size) { > + struct kvm_device_attr get_size = { > + .group = KVM_DEV_ARM_PV_TIME_STATE_SIZE, > + .attr = attr, > + .addr = (u64)(unsigned long)size > + }; > + > + ioctl(fd, KVM_GET_DEVICE_ATTR, get_size); > + > + void *buffer = malloc(*size); > + > + struct kvm_device_attr get_state = { > + .group = KVM_DEV_ARM_PV_TIME_STATE, > + .attr = attr, > + .addr = (u64)(unsigned long)size > + }; > + > + ioctl(fd, KVM_GET_DEVICE_ATTR, buffer); > + } > + > + void *lpt_state = save_state(pvtime_fd, KVM_DEV_ARM_PV_TIME_LPT, &lpt_size); > + void *st_state = save_state(pvtime_fd, KVM_DEV_ARM_PV_TIME_ST, &st_size); > + > -- > 2.19.2 > > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm