On Thu, Jun 09, 2016 at 03:25:02PM -0300, Marcelo Tosatti wrote: > On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote: > > On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote: > > > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory > > > (REF_PAGE) clocks in sync. Even if the frequency correction is the same > > > for both, the kernel updates monotonic clock differently than the > > > stated frequency that is: > > > > > > monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq > > > > > > So the best solution IMO is to: > > > > > > reads of guest clock = max(shared memory clock, get_kernel_ns() + > > > kvmclock_offset) > > > > > > Where reads of guest clock include: 1) kvm_get_time_and_clockread > > > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK). > > > > > > Unless someone has a better idea, Roman, can you update your patch to > > > include such solution? for kvm_get_time_and_clockread, can fast forward > > > kvmclock_offset so that > > > > > > kvmclock_offset + get_kernel_ns() = shared memory clock > > > > I'm not sure I understand what you mean. > > > > ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e. > > at the time kvm_get_time_and_clockread() runs they are in sync by > > definition. They can diverge later due to different whole number math > > applied. > > Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you > read from the guest). > > > Add wrapper around get_kernel_ns + kvmclock_offset reads: > > Record somewhere the last returned (last_returned_guestclock). > > u64 read_guest_clock(struct kvm *kvm) > { > mutex_lock(&guest_clock_mutex); > ret = get_kernel_ns() + kvmclock_offset; > kvm->arch.last_returned_guestclock = ret; > mutex_unlock(&guest_clock_mutex); > } > > > Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)): > > 1. read guest shared memory = smval. > 2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock > 3. if (kclock < smval) > kvmclock_offset += smval - kclock > > > Two possibilites for clocks state: > > P1. shared memory clock > get_kernel_ns() + kvmclock_offset > P2. shared memory clock < get_kernel_ns() + kvmclock_offset > > Two possibilites for guest behaviour: > G1. a = shared memory clock read; > b = get_kernel_ns() + kvmclock_offset > > G1/P1: > > a = shared memory clock read (previous read, stored in memory) > b = get_kernel_ns() + kvmclock_offset > > After sync code above: Note smval > a, so b = smval > a > > G1/P2: > > a = shared memory clock read (previous read, stored in memory) > b = get_kernel_ns() + kvmclock_offset > > a < b, fine. > > G2 (second possibility for guest behaviour) > a = get_kernel_ns() + kvmclock_offset > b = shared memory clock read > > G2/P1: fine, b > a. > > G2/P2: > a = get_kernel_ns() + kvmclock_offset > > b = shared memory clock read > > So we have to either reduce get_kernel_ns() + kvmclock_offset so that > b is larger or 'stop get_kernel_ns() + kvmclock_offset'. > > Can make get_kernel_ns() + kvmclock_offset be as small as > last_returned_guestclock (otherwise users of get_kernel_ns() + > kvmclock_offset can see time backwards). > > 0. mutex_lock(&guest_clock_mutex); > 01. getkernelns = get_kernel_ns(); > 1. read guest shared memory = smval. > 2. kclock = getkernelns + kvmclock_offset > 3. if (kclock > smval) > kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock, > kclock - smval) > 4. kclock = getkernelns + kvmclock_offset > 5. if (kclock > smval) { > schedule_timeout(kclock - smval); > kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock, > kclock - smval) > } > 6. mutex_unlock(&guest_clock_mutex); > > That works, right? I wouldn't say so. First, I don't think changing kvmclock_offset is allowed other than through ioctl(KVM_SET_CLOCK); it violates everybody's expectation that this is the difference between the host and the guest clocks. Second, since masterclock updates essentially do that synchronization, I think that instead of all that compilcation we can just return host-side-calculated value of REFERENCE_TSC_PAGE clock from rdmsr(REF_CLOCK) if masterclock is enabled. > > There's also a problem that there can be arbitrary amount of time > > between assigning the return value for guest's rdmsr and actually > > entering the guest to deliver that value. > > Don't think that matters, see the 4 cases above. It does in the sense that between the point where we calculate the value we're about to return from rdmsr(REF_CLOCK), and the time the guest will actually see it, another vCPU can read and even update the reference tsc page. However, as I wrote in another message in this thread, there's no way to guarantee clock reads to be monotonic across several vCPUs; OTOH that doesn't violate the monotonicity on a specific vCPU. So I'll probably just add another patch adjusting rdmsr(REF_CLOCK) to return shared memory clock if it's enabled. Roman. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html