On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote: > On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote: > > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory > > (REF_PAGE) clocks in sync. Even if the frequency correction is the same > > for both, the kernel updates monotonic clock differently than the > > stated frequency that is: > > > > monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq > > > > So the best solution IMO is to: > > > > reads of guest clock = max(shared memory clock, get_kernel_ns() + > > kvmclock_offset) > > > > Where reads of guest clock include: 1) kvm_get_time_and_clockread > > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK). > > > > Unless someone has a better idea, Roman, can you update your patch to > > include such solution? for kvm_get_time_and_clockread, can fast forward > > kvmclock_offset so that > > > > kvmclock_offset + get_kernel_ns() = shared memory clock > > I'm not sure I understand what you mean. > > ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e. > at the time kvm_get_time_and_clockread() runs they are in sync by > definition. They can diverge later due to different whole number math > applied. Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you read from the guest). Add wrapper around get_kernel_ns + kvmclock_offset reads: Record somewhere the last returned (last_returned_guestclock). u64 read_guest_clock(struct kvm *kvm) { mutex_lock(&guest_clock_mutex); ret = get_kernel_ns() + kvmclock_offset; kvm->arch.last_returned_guestclock = ret; mutex_unlock(&guest_clock_mutex); } Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)): 1. read guest shared memory = smval. 2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock 3. if (kclock < smval) kvmclock_offset += smval - kclock Two possibilites for clocks state: P1. shared memory clock > get_kernel_ns() + kvmclock_offset P2. shared memory clock < get_kernel_ns() + kvmclock_offset Two possibilites for guest behaviour: G1. a = shared memory clock read; b = get_kernel_ns() + kvmclock_offset G1/P1: a = shared memory clock read (previous read, stored in memory) b = get_kernel_ns() + kvmclock_offset After sync code above: Note smval > a, so b = smval > a G1/P2: a = shared memory clock read (previous read, stored in memory) b = get_kernel_ns() + kvmclock_offset a < b, fine. G2 (second possibility for guest behaviour) a = get_kernel_ns() + kvmclock_offset b = shared memory clock read G2/P1: fine, b > a. G2/P2: a = get_kernel_ns() + kvmclock_offset > b = shared memory clock read So we have to either reduce get_kernel_ns() + kvmclock_offset so that b is larger or 'stop get_kernel_ns() + kvmclock_offset'. Can make get_kernel_ns() + kvmclock_offset be as small as last_returned_guestclock (otherwise users of get_kernel_ns() + kvmclock_offset can see time backwards). 0. mutex_lock(&guest_clock_mutex); 01. getkernelns = get_kernel_ns(); 1. read guest shared memory = smval. 2. kclock = getkernelns + kvmclock_offset 3. if (kclock > smval) kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock, kclock - smval) 4. kclock = getkernelns + kvmclock_offset 5. if (kclock > smval) { schedule_timeout(kclock - smval); kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock, kclock - smval) } 6. mutex_unlock(&guest_clock_mutex); That works, right? I'll code that patch next week if you don't beat me to it. > There's also a problem that there can be arbitrary amount of time > between assigning the return value for guest's rdmsr and actually > entering the guest to deliver that value. Don't think that matters, see the 4 cases above. > In general I'm starting to feel the shared memory clock is trying to > provide stronger guarantees than really useful. E.g. there's no such > thing as synchronous TSC between vCPUs in a virtual machine, so every > guest assuming it is broken; There is, the TSC is monotonic between pCPUs: pCPU1 | pCPU2 1. a = read tsc 2. b = read tsc. > in reality that means that every sane guest > must tolerate certain violations of monotonicity when multiple CPUs are > used for timekeeping. I wonder if this consideration can allow for some > simplification of the paravirtual clock code... I think applications can fail. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html