Re: KVM-Clock

Wanpeng Li <kernellwp@xxxxxxxxx> · Tue, 23 Feb 2016 16:04:29 +0800



2016-02-23 15:11 GMT+08:00 Avi Cohen <avi.cohen@xxxxxxxxxx>:
>
> 2016-02-21 16:57+0000, Avi Cohen:
>> Hello,
>>
>> Last week I've sent a mail regarding the kvm-clock accuracy.
>> Now I try to draw-up my question again, any answer/partial/hint  is
>> greatly appreciated
>>
>> Our application is running in  a Tenant's Virtual Machine in a data-centre.
>> We have some OAM functions running in  the VMs.
>> One OAM function is to measure one-way delay between VMa and VMb.
>> One way delay measurement requires that all machines should be synchronized to a common central clock.
>> Accuracy requirement is in order of 10s nano-seconds, hence only the 1588v2/PTP is suitable here.
>> Since we cannot use HW timestamping in a virtual machine (we cannot force using SR-IOV), I thought to run PTP on the physical machines and to sync the VMs to the host by the kvm-clock.
>
> kvmclock doesn't do synchronization with host clock or UTC.
> kvmclock bases on host's notion of *passed* time.
> kvmclock allows the guest to measure a flow of time.
>
> It is another layer's job to translate kvmclock result into a timestamp that can be compared.  kvmclock was designed like that, because KVM wants to make a guest independent on hosts.
>
> I see the system time written by KVM  whenever the VM is entered  - in  kvm_guest_time_update()
> The KVM updates  system_time with it’s monotonic-time (as you say *passed* time)

boot time instead of monotonic time, boot time is equal to monotonic
time + suspend time

> How can  the guest  (or  another layer's job ) - translate kvmclock result into a timestamp that can be compared ?
> This is much required when I'll run the PTP in the host  - in which the CLOCK_REALTIME will be affected (probably not the CLOCK_MONOTONIC)
>
>> But now I see that the clock in the VM is far away from the host ( ~ Hundreds of micro-second) , and this before I even run the PTP in the host...
>
> What delta do you get after running PTP in guests?
>
> I still don’t run the PTP - first I want to sync the guest to the host free-running clock , then If it succeeds I'll run the PTP
> Currently without the PTP , I see about 1 second delta between the 2 clocks.
>
> (Host and guest seem somehow synchronized, because QEMU stores host time  into RTC.  The guest reads RTC on boot, but that has nothing to with  kvmclock, and RTC's accuracy is poor.)
>
> I understand that - but I expect that later the 2 clocks will be in-sync.
>
>> My test is very simple - I send a packet from host to the VM, I set the host time (tx_time) in the packet.  When the guest receives the packet it reads its time (rx_time)  and calculate the delay as :
>> Delay = rx_time - tx_time
>> I use the clock_gettime(REALTIME) in the host to set tx-time and in the guest to read rx_time.
>> My questions :
>> 1. Assuming my HW support the paravirtualization clock requirements -
>> (see below output of cpuinfo)  ,
>
> (kvmclock clock doesn't have any requirements other than presence of  TSC, which is why it's the default.  The guest can have requirements  that aren't met on some HW, though.)
>
> I meant that my HW supports the constant TSC - hence I can rely on the TSC.
>
>> In Theory  - Is it possible to achieve 10s ns accuracy between VM clock and the host clock ?
>
> It is.
>
> How ? this is the only question
>
>
> (kvmclock on new CPUs has same drift/resolution/jitter/... as TSC.
>  Reading kvmclock is slower that just doing rdtsc, but likely still  within tens of nanoseconds.)
>
>> or I'm too naïve and have  to abandon the idea to run this timing sensitive application on a VM, and instead run it in  Linux  container for example?
>
> That depends on the reason behind synchronizing clocks, because VM can provide same precision as the host.  Running in containers is almost the same as running on the host, so you might prefer their trade-offs.
>
>> 2. I understand that in the  kvm-clock process, the kvm writes
>> (whenever it enters the VM)
>
> (KVM doesn't update on every entry if you machine has invariable TSC.)
>
> I have constant TSC
>> its system_time and the VM_TSC @ current time to the pvclock page , then the guest OS can calculate its current time by:
>
> KVM doesn't write its (= host's) system_time.
> KVM writes *guest's* system_time.  Guest's system_time at VM_TSC.
>
> (system_time is 0 when the VM starts.  sytem_time can store ~584 years  worth of nanoseconds, but using an arbitrary offset makes everything  simpler.  This part of kvmclock is pretty confusing, so system_time is  likely the source of misunderstanding.)
>
> Have you read that kvmclock does synchronization with host time somewhere?
>
> Yes - see - in  vcpu_enter_guest() there is a call to   kvm_guest_time_update()
> Which update  the pvclock paget   for the guest - see that it updates the system_time  with the host system-time
>
> struct pvclock_vcpu_time_info {
>  26         u32   version;
>  27         u32   pad0;
>  28         u64   tsc_timestamp;
>  29         u64   system_time;
>  30         u32   tsc_to_system_mul;
>  31         s8    tsc_shift;
>  32         u8    flags;
>  33         u8    pad[2];
>  34 } __attribute__((__packed__)); /* 32 bytes */
>
> Thanks.
>
>> Current-time = system_time + multiplier (RDTSC() -VM_TSC)   (system_time and VM_TSC as set by the kvmas set by the kvm))
>> I understand that there is no VM-exit when the VM calls RDTSC().
>
> Yes.
>
>> - is that description correct ?
>
> Partly.
>
>> I understand that this is supported by the guest OS and this should be transparent to my application, correct ?
>
> Yes.
>
>> My guest and host are Fedora 22.
>>
>> 3. Other idea how to achieve this accuracy ?
>
> I hope that PTP is enough, because I can't recommend anything that can be done without introducing a new paravirtual device ... an example of potential one-time synchronization with higher accuracy than PTP:
>
> A guest asks a host what time it is by issuing a hypercall.
> The host replies with a
>   (kvmclock timestamp, nanoseconds since some standard time) pair.
>
> It's going to be more complex if you want more features.


-- 
Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html