On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote: >> [cc: John Stultz -- maybe you have ideas on how this should best >> integrate with the core code] >> >> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: >> > Can you write an actual proposal (with details) that accomodates the >> > issue described at "Assuming a stable TSC across physical CPUS, and a >> > stable TSC" ? >> > >> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for >> > realtime guests. >> >> This shouldn't require many details, and I don't think there's an ABI >> change. The rules are: >> >> When the overall system timebase changes (e.g. when the selected >> clocksource changes or when update_pvclock_gtod is called), the KVM >> host would: >> >> optionally: preempt_disable(); /* for performance */ >> >> for all vms { >> >> for all registered pvti structures { >> pvti->version++; /* should be odd now */ >> } > > pvti is userspace data, so you have to pin it before? Yes. Fortunately, most systems probably only have one page of pvti structures, I think (unless there are a ton of vcpus), so the performance impact should be negligible. > >> /* Note: right now, any vcpu that tries to access pvti will start >> infinite looping. We should add cpu_relax() to the guests. */ >> >> for all registered pvti structures { >> update everything except pvti->version; >> } >> >> for all registered pvti structures { >> pvti->version++; /* should be even now */ >> } >> >> cond_resched(); >> } >> >> Is this enough detail? This should work with all existing guests, >> too, unless there's a buggy guest out there that actually fails to >> double-check version. > > What is the advantage of this over the brute force method, given > that guests will busy spin? > > (busy spin is equally problematic as IPI for realtime guests). I disagree. It's never been safe to call clock_gettime from an RT task and expect a guarantee of real-time performance. We could fix that, but it's not even safe on non-KVM. Sending an IPI *always* stalls the task. Taking a lock (which is effectively what this is doing) only stalls the tasks that contend for the lock, which, most of the time, means that nothing stalls. Also, if the host disables preemption or otherwise boosts its priority while version is odd, then the actual stall will be very short, in contrast to an IPI-induced stall, which will be much, much longer. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html