Re: kvmclock doesn't work, help?

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Wed, 23 Dec 2015 17:27:01 -0200

On Mon, Dec 21, 2015 at 02:49:25PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> > On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote:
> >> [cc: John Stultz -- maybe you have ideas on how this should best
> >> integrate with the core code]
> >>
> >> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> 
> >> > Can you write an actual proposal (with details) that accomodates the
> >> > issue described at "Assuming a stable TSC across physical CPUS, and a
> >> > stable TSC" ?
> >> >
> >> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for
> >> > realtime guests.
> >>
> >> This shouldn't require many details, and I don't think there's an ABI
> >> change.  The rules are:
> >>
> >> When the overall system timebase changes (e.g. when the selected
> >> clocksource changes or when update_pvclock_gtod is called), the KVM
> >> host would:
> >>
> >> optionally: preempt_disable();  /* for performance */
> >>
> >> for all vms {
> >>
> >>   for all registered pvti structures {
> >>     pvti->version++;  /* should be odd now */
> >>   }
> >
> > pvti is userspace data, so you have to pin it before?
> 
> Yes.
> 
> Fortunately, most systems probably only have one page of pvti
> structures, I think (unless there are a ton of vcpus), so the
> performance impact should be negligible.
> 
> >
> >>   /* Note: right now, any vcpu that tries to access pvti will start
> >> infinite looping.  We should add cpu_relax() to the guests. */
> >>
> >>   for all registered pvti structures {
> >>     update everything except pvti->version;
> >>   }
> >>
> >>   for all registered pvti structures {
> >>     pvti->version++;  /* should be even now */
> >>   }
> >>
> >>   cond_resched();
> >> }
> >>
> >> Is this enough detail?  This should work with all existing guests,
> >> too, unless there's a buggy guest out there that actually fails to
> >> double-check version.
> >
> > What is the advantage of this over the brute force method, given
> > that guests will busy spin?
> >
> > (busy spin is equally problematic as IPI for realtime guests).
> 
> I disagree.  It's never been safe to call clock_gettime from an RT
> task and expect a guarantee of real-time performance.  We could fix
> that, but it's not even safe on non-KVM.

The problem is how long the IPI (or busy spinning in case of version
above) interrupts the vcpu.

> Sending an IPI *always* stalls the task.  Taking a lock (which is
> effectively what this is doing) only stalls the tasks that contend for
> the lock, which, most of the time, means that nothing stalls.
> 
> Also, if the host disables preemption or otherwise boosts its priority
> while version is odd, then the actual stall will be very short, in
> contrast to an IPI-induced stall, which will be much, much longer.
> 
> --Andy

1) The updates are rare.
2) There are no user complaints about the IPI mechanism.

Don't see a reason to change this.

For the suspend issue, though, there are complaints (guests on 
laptops which fail to use masterclock). 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html