On Mon, Jun 13, 2016 at 08:07:47PM +0300, Roman Kagan wrote: > On Thu, Jun 09, 2016 at 03:25:02PM -0300, Marcelo Tosatti wrote: > > On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote: > > > On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote: > > > > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory > > > > (REF_PAGE) clocks in sync. Even if the frequency correction is the same > > > > for both, the kernel updates monotonic clock differently than the > > > > stated frequency that is: > > > > > > > > monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq > > > > > > > > So the best solution IMO is to: > > > > > > > > reads of guest clock = max(shared memory clock, get_kernel_ns() + > > > > kvmclock_offset) > > > > > > > > Where reads of guest clock include: 1) kvm_get_time_and_clockread > > > > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK). > > > > > > > > Unless someone has a better idea, Roman, can you update your patch to > > > > include such solution? for kvm_get_time_and_clockread, can fast forward > > > > kvmclock_offset so that > > > > > > > > kvmclock_offset + get_kernel_ns() = shared memory clock > > > > > > I'm not sure I understand what you mean. > > > > > > ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e. > > > at the time kvm_get_time_and_clockread() runs they are in sync by > > > definition. They can diverge later due to different whole number math > > > applied. > > > > Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you > > read from the guest). > > > > > > Add wrapper around get_kernel_ns + kvmclock_offset reads: > > > > Record somewhere the last returned (last_returned_guestclock). > > > > u64 read_guest_clock(struct kvm *kvm) > > { > > mutex_lock(&guest_clock_mutex); > > ret = get_kernel_ns() + kvmclock_offset; > > kvm->arch.last_returned_guestclock = ret; > > mutex_unlock(&guest_clock_mutex); > > } > > > > > > Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)): > > > > 1. read guest shared memory = smval. > > 2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock > > 3. if (kclock < smval) > > kvmclock_offset += smval - kclock > > > > > > Two possibilites for clocks state: > > > > P1. shared memory clock > get_kernel_ns() + kvmclock_offset > > P2. shared memory clock < get_kernel_ns() + kvmclock_offset > > > > Two possibilites for guest behaviour: > > G1. a = shared memory clock read; > > b = get_kernel_ns() + kvmclock_offset > > > > G1/P1: > > > > a = shared memory clock read (previous read, stored in memory) > > b = get_kernel_ns() + kvmclock_offset > > > > After sync code above: Note smval > a, so b = smval > a > > > > G1/P2: > > > > a = shared memory clock read (previous read, stored in memory) > > b = get_kernel_ns() + kvmclock_offset > > > > a < b, fine. > > > > G2 (second possibility for guest behaviour) > > a = get_kernel_ns() + kvmclock_offset > > b = shared memory clock read > > > > G2/P1: fine, b > a. > > > > G2/P2: > > a = get_kernel_ns() + kvmclock_offset > > > b = shared memory clock read > > > > So we have to either reduce get_kernel_ns() + kvmclock_offset so that > > b is larger or 'stop get_kernel_ns() + kvmclock_offset'. > > > > Can make get_kernel_ns() + kvmclock_offset be as small as > > last_returned_guestclock (otherwise users of get_kernel_ns() + > > kvmclock_offset can see time backwards). > > > > 0. mutex_lock(&guest_clock_mutex); > > 01. getkernelns = get_kernel_ns(); > > 1. read guest shared memory = smval. > > 2. kclock = getkernelns + kvmclock_offset > > 3. if (kclock > smval) > > kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock, > > kclock - smval) > > 4. kclock = getkernelns + kvmclock_offset > > 5. if (kclock > smval) { > > schedule_timeout(kclock - smval); > > kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock, > > kclock - smval) > > } > > 6. mutex_unlock(&guest_clock_mutex); > > > > That works, right? > > I wouldn't say so. > > First, I don't think changing kvmclock_offset is allowed other than > through ioctl(KVM_SET_CLOCK); Says who? > it violates everybody's expectation that > this is the difference between the host and the guest clocks. Who's expectation? userspace is supposed to, when the guest starts issue KVM_SET_CLOCK. When the guest stops, issue KVM_GET_CLOCK (save value), and issue again KVM_SET_CLOCK (with previously saved value). Can't see what the problem is. > Second, since masterclock updates essentially do that synchronization, I > think that instead of all that compilcation we can just return > host-side-calculated value of REFERENCE_TSC_PAGE clock from > rdmsr(REF_CLOCK) if masterclock is enabled. Yes, that is simpler, but i also wanted to deal with the potential backwards event that happens at masterclock update time (that is, if you are going to update refclock page from get_kernel_ns() + kvmclock_offset, need to avoid backwards event in case get_kernel_ns() + kvmclock_offset < refclock. Which is the problem in the G2/P2 case above. But that can be done separately, so yeah sure, returning host-side-calculated value of REFERENCE_TSC_PAGE clock from rdmsr(REF_CLOCK) deals with the monotonicity issue. > > > There's also a problem that there can be arbitrary amount of time > > > between assigning the return value for guest's rdmsr and actually > > > entering the guest to deliver that value. > > > > Don't think that matters, see the 4 cases above. > > It does in the sense that between the point where we calculate the value > we're about to return from rdmsr(REF_CLOCK), and the time the guest will > actually see it, another vCPU can read and even update the reference tsc > page. Not sure i see, can you outline the problem in the style of the 4 cases above? (time diagrams). > However, as I wrote in another message in this thread, there's > no way to guarantee clock reads to be monotonic across several vCPUs; > OTOH that doesn't violate the monotonicity on a specific vCPU. Yes it does and that has been a concern with Linux for a long time. See time-warp-test.c testcase for example. > So I'll probably just add another patch adjusting rdmsr(REF_CLOCK) to > return shared memory clock if it's enabled. > > Roman. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html