Re: vread in kvm_clock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/19/2010 08:16 PM, Avi Kivity wrote:
On 12/20/2010 03:15 AM, Zachary Amsden wrote:
On 12/19/2010 05:27 AM, Avi Kivity wrote:
On 12/17/2010 07:43 PM, Zachary Amsden wrote:
On 12/15/2010 10:16 AM, Julien Desfossez wrote:
Hi,

I'm currently working with the kvm clocksource and I'm wondering if we could implement the vread function for this clock source when we are running on a host with constant_tsc. If I understand correctly the hv_clock structure is per_cpu because of the eventual frequency changes, but in the case of constant_tsc (and after validation that the TSC is synchronized across all the cores) I think we could have a working vread function.

In case of migration, could we have a fallback in case we detect we end up on a CPU without constant_tsc ?

Any advice/explanation would be greatly appreciated !

It's a bit more complex than that. In addition to the problem you mention with migration, even if the TSC is synchronized, the kvmclock still is not, even with constant_tsc. There is measurement error in between reading the TSC and computing the per_cpu hv_clock offset which varies between CPUs.


What about using rdtscp?

We could also disable kvmclock if constant_tsc and migration is not desired, or if constant_tsc and the new tsc multiplier on bulldozers is available on all machines in the migration cluster.

Even then, we need an atomic in the vread path.

Why?

RDTSCP will get you a per-CPU measured TSC+offset. But even if there is global agreement on the TSC, there is still variation on the offset because kvmclock offsets are computed at different times in a per-cpu fashion.

So you can still go backwards in a global measurement with kvmclock, even with a synchronized TSC.


The tsc multiplier does not look usable for virtualization, btw.

Why?


The documentation I've seen implies it is a single MSR that controls the TSC multiplier. This implies two massively negative potential ways it works:


1) It can be set during the switching of MSRs when transitioning to the VMCB. Then, it accelerates the TSC while running in SVM. Observe that the time spent not running in SVM for a particular VMCB, regular TSC time will pass. This time must be accounted by counting real time and adjusting the TSC offset each and every time the VMCB is used.

First, note that counting real time with the TSC is impossible if you switch CPUs and the hardware TSCs are not synchronized. You must fall back to a secondary clock source, which greatly reduces precision of the computation.

Second, note that even if the hardware TSCs are synchronized, you are now re-computing the offset, in fact, MUST re-compute the TSC offset at each and every entry to the VMCB. This means SMP VMs will have desynchronized TSCs, because every computation of the offset introduces a variable amount of error and all VCPUs will perform this computation at different times.

You end up back with a continuous random error applied to kvmclock / TSC for all CPUs and the required atomic check for TSC going backwards.

So it saves the exit cost, but at the cost of even greater complexity, which we don't need in the TSC code. The performance issue in the migration scenario is not really that big of a deal. Migrate to a slower machine; we catch up the TSC every chance we get to bring it back up to speed, and indicate that the atomic backwards protection is needed. Migrate to a faster machine; we trap the TSC. Yes, it's expensive. It also is running on a faster machine, probably greater than 10% faster, which is about what we'll see for overhead. So there is no net performance degradation. Reboot the VM, maximum potential performance is restored.


2) It isn't clear that setting the MSR affects only the TSC in SVM mode. If that were the case, why did they not add a new VMCB field extension, why is it a hardware MSR, which can be set outside of SVM?

The scary implication here is that the MSR may actually affect the hardware TSC rate, in which case, using it for SVM scaling destabilizes the host TSC.

That's completely non-usable. I hope it doesn't work that way, I hope AMD was clever enough to use the auxiliary TSC for this, but that means there are still complex issues when you have multiple VMs running at different scaled TSC rates.


My take on in, either way, regardless of how it is implemented is that basically, it adds a lot more complexity to the equation without solving the fundamental problem. There's already a good enough software solution, we should use that.

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux