On Tue, Aug 27, 2019 at 10:57:06AM +0200, Christoffer Dall wrote: > On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote: > > Introduce a paravirtualization interface for KVM/arm64 based on the > > "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A. > > > > This only adds the details about "Stolen Time" as the details of "Live > > Physical Time" have not been fully agreed. > > > > User space can specify a reserved area of memory for the guest and > > inform KVM to populate the memory with information on time that the host > > kernel has stolen from the guest. > > > > A hypercall interface is provided for the guest to interrogate the > > hypervisor's support for this interface and the location of the shared > > memory structures. > > > > Signed-off-by: Steven Price <steven.price@xxxxxxx> > > --- > > Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++ > > 1 file changed, 100 insertions(+) > > create mode 100644 Documentation/virt/kvm/arm/pvtime.txt > > > > diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt > > new file mode 100644 > > index 000000000000..1ceb118694e7 > > --- /dev/null > > +++ b/Documentation/virt/kvm/arm/pvtime.txt > > @@ -0,0 +1,100 @@ > > +Paravirtualized time support for arm64 > > +====================================== > > + > > +Arm specification DEN0057/A defined a standard for paravirtualised time > > +support for AArch64 guests: > > + > > +https://developer.arm.com/docs/den0057/a > > + > > +KVM/arm64 implements the stolen time part of this specification by providing > > +some hypervisor service calls to support a paravirtualized guest obtaining a > > +view of the amount of time stolen from its execution. > > + > > +Two new SMCCC compatible hypercalls are defined: > > + > > +PV_FEATURES 0xC5000020 > > +PV_TIME_ST 0xC5000022 > > + > > +These are only available in the SMC64/HVC64 calling convention as > > +paravirtualized time is not available to 32 bit Arm guests. The existence of > > +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES > > +mechanism before calling it. > > + > > +PV_FEATURES > > + Function ID: (uint32) : 0xC5000020 > > + PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST > > + Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant > > + PV-time feature is supported by the hypervisor. > > + > > +PV_TIME_ST > > + Function ID: (uint32) : 0xC5000022 > > + Return value: (int64) : IPA of the stolen time data structure for this > > + (V)CPU. On failure: > > + NOT_SUPPORTED (-1) > > + > > +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory > > +with inner and outer write back caching attributes, in the inner shareable > > +domain. A total of 16 bytes from the IPA returned are guaranteed to be > > +meaningfully filled by the hypervisor (see structure below). > > + > > +PV_TIME_ST returns the structure for the calling VCPU. > > + > > +Stolen Time > > +----------- > > + > > +The structure pointed to by the PV_TIME_ST hypercall is as follows: > > + > > + Field | Byte Length | Byte Offset | Description > > + ----------- | ----------- | ----------- | -------------------------- > > + Revision | 4 | 0 | Must be 0 for version 0.1 > > + Attributes | 4 | 4 | Must be 0 > > + Stolen time | 8 | 8 | Stolen time in unsigned > > + | | | nanoseconds indicating how > > + | | | much time this VCPU thread > > + | | | was involuntarily not > > + | | | running on a physical CPU. > > + > > +The structure will be updated by the hypervisor prior to scheduling a VCPU. It > > +will be present within a reserved region of the normal memory given to the > > +guest. The guest should not attempt to write into this memory. There is a > > +structure per VCPU of the guest. > > + > > +User space interface > > +==================== > > + > > +User space can request that KVM provide the paravirtualized time interface to > > +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example: > > + > > + struct kvm_create_device pvtime_device = { > > + .type = KVM_DEV_TYPE_ARM_PV_TIME, > > + .attr = 0, > > + .flags = 0, > > + }; > > + > > + pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device); > > + > > +Creation of the device should be done after creating the vCPUs of the virtual > > +machine. > > + > > +The IPA of the structures must be given to KVM. This is the base address > > +of an array of stolen time structures (one for each VCPU). The base address > > +must be page aligned. The size must be at least 64 * number of VCPUs and be a > > +multiple of PAGE_SIZE. > > + > > +The memory for these structures should be added to the guest in the usual > > +manner (e.g. using KVM_SET_USER_MEMORY_REGION). > > + > > +For example: > > + > > + struct kvm_dev_arm_st_region region = { > > + .gpa = <IPA of guest base address>, > > + .size = <size in bytes> > > + }; > > This feel fragile; how are you handling userspace creating VCPUs after > setting this up, the GPA overlapping guest memory, etc. Is the > philosophy here that the VMM can mess up the VM if it wants, but that > this should never lead attacks on the host (we better hope not) and so > we don't care? > > It seems to me setting the IPA per vcpu throught the VCPU device would > avoid a lot of these issues. See > Documentation/virt/kvm/devices/vcpu.txt. > > I discussed this with Marc the other day, and we realized that if we make the configuration of the IPA per-PE, then a VMM can construct a VM where these data structures are distributed within the IPA space of a VM, which could lead to a lower TLB pressure for some configurations/workloads. Thanks, Christoffer