On 12/19/2010 08:16 PM, Avi Kivity wrote:
On 12/20/2010 03:15 AM, Zachary Amsden wrote:
On 12/19/2010 05:27 AM, Avi Kivity wrote:
On 12/17/2010 07:43 PM, Zachary Amsden wrote:
On 12/15/2010 10:16 AM, Julien Desfossez wrote:
Hi,
I'm currently working with the kvm clocksource and I'm wondering
if we could implement the vread function for this clock source
when we are running on a host with constant_tsc.
If I understand correctly the hv_clock structure is per_cpu
because of the eventual frequency changes, but in the case of
constant_tsc (and after validation that the TSC is synchronized
across all the cores) I think we could have a working vread function.
In case of migration, could we have a fallback in case we detect
we end up on a CPU without constant_tsc ?
Any advice/explanation would be greatly appreciated !
It's a bit more complex than that. In addition to the problem you
mention with migration, even if the TSC is synchronized, the
kvmclock still is not, even with constant_tsc. There is
measurement error in between reading the TSC and computing the
per_cpu hv_clock offset which varies between CPUs.
What about using rdtscp?
We could also disable kvmclock if constant_tsc and migration is not
desired, or if constant_tsc and the new tsc multiplier on bulldozers
is available on all machines in the migration cluster.
Even then, we need an atomic in the vread path.
Why?
RDTSCP will get you a per-CPU measured TSC+offset. But even if there is
global agreement on the TSC, there is still variation on the offset
because kvmclock offsets are computed at different times in a per-cpu
fashion.
So you can still go backwards in a global measurement with kvmclock,
even with a synchronized TSC.
The tsc multiplier does not look usable for virtualization, btw.
Why?
The documentation I've seen implies it is a single MSR that controls the
TSC multiplier. This implies two massively negative potential ways it
works:
1) It can be set during the switching of MSRs when transitioning to the
VMCB. Then, it accelerates the TSC while running in SVM. Observe that
the time spent not running in SVM for a particular VMCB, regular TSC
time will pass. This time must be accounted by counting real time and
adjusting the TSC offset each and every time the VMCB is used.
First, note that counting real time with the TSC is impossible if you
switch CPUs and the hardware TSCs are not synchronized. You must fall
back to a secondary clock source, which greatly reduces precision of the
computation.
Second, note that even if the hardware TSCs are synchronized, you are
now re-computing the offset, in fact, MUST re-compute the TSC offset at
each and every entry to the VMCB. This means SMP VMs will have
desynchronized TSCs, because every computation of the offset introduces
a variable amount of error and all VCPUs will perform this computation
at different times.
You end up back with a continuous random error applied to kvmclock / TSC
for all CPUs and the required atomic check for TSC going backwards.
So it saves the exit cost, but at the cost of even greater complexity,
which we don't need in the TSC code. The performance issue in the
migration scenario is not really that big of a deal. Migrate to a
slower machine; we catch up the TSC every chance we get to bring it back
up to speed, and indicate that the atomic backwards protection is
needed. Migrate to a faster machine; we trap the TSC. Yes, it's
expensive. It also is running on a faster machine, probably greater
than 10% faster, which is about what we'll see for overhead. So there
is no net performance degradation. Reboot the VM, maximum potential
performance is restored.
2) It isn't clear that setting the MSR affects only the TSC in SVM
mode. If that were the case, why did they not add a new VMCB field
extension, why is it a hardware MSR, which can be set outside of SVM?
The scary implication here is that the MSR may actually affect the
hardware TSC rate, in which case, using it for SVM scaling destabilizes
the host TSC.
That's completely non-usable. I hope it doesn't work that way, I hope
AMD was clever enough to use the auxiliary TSC for this, but that means
there are still complex issues when you have multiple VMs running at
different scaled TSC rates.
My take on in, either way, regardless of how it is implemented is that
basically, it adds a lot more complexity to the equation without solving
the fundamental problem. There's already a good enough software
solution, we should use that.
Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html