On Mon, Dec 21, 2015 at 02:49:25PM -0800, Andy Lutomirski wrote: > On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote: > >> [cc: John Stultz -- maybe you have ideas on how this should best > >> integrate with the core code] > >> > >> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > >> > Can you write an actual proposal (with details) that accomodates the > >> > issue described at "Assuming a stable TSC across physical CPUS, and a > >> > stable TSC" ? > >> > > >> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for > >> > realtime guests. > >> > >> This shouldn't require many details, and I don't think there's an ABI > >> change. The rules are: > >> > >> When the overall system timebase changes (e.g. when the selected > >> clocksource changes or when update_pvclock_gtod is called), the KVM > >> host would: > >> > >> optionally: preempt_disable(); /* for performance */ > >> > >> for all vms { > >> > >> for all registered pvti structures { > >> pvti->version++; /* should be odd now */ > >> } > > > > pvti is userspace data, so you have to pin it before? > > Yes. > > Fortunately, most systems probably only have one page of pvti > structures, I think (unless there are a ton of vcpus), so the > performance impact should be negligible. > > > > >> /* Note: right now, any vcpu that tries to access pvti will start > >> infinite looping. We should add cpu_relax() to the guests. */ > >> > >> for all registered pvti structures { > >> update everything except pvti->version; > >> } > >> > >> for all registered pvti structures { > >> pvti->version++; /* should be even now */ > >> } > >> > >> cond_resched(); > >> } > >> > >> Is this enough detail? This should work with all existing guests, > >> too, unless there's a buggy guest out there that actually fails to > >> double-check version. > > > > What is the advantage of this over the brute force method, given > > that guests will busy spin? > > > > (busy spin is equally problematic as IPI for realtime guests). > > I disagree. It's never been safe to call clock_gettime from an RT > task and expect a guarantee of real-time performance. We could fix > that, but it's not even safe on non-KVM. The problem is how long the IPI (or busy spinning in case of version above) interrupts the vcpu. > Sending an IPI *always* stalls the task. Taking a lock (which is > effectively what this is doing) only stalls the tasks that contend for > the lock, which, most of the time, means that nothing stalls. > > Also, if the host disables preemption or otherwise boosts its priority > while version is odd, then the actual stall will be very short, in > contrast to an IPI-induced stall, which will be much, much longer. > > --Andy 1) The updates are rare. 2) There are no user complaints about the IPI mechanism. Don't see a reason to change this. For the suspend issue, though, there are complaints (guests on laptops which fail to use masterclock). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html