Re: KVM timekeeping and TSC virtualization

Zachary Amsden <zamsden@xxxxxxxxxx> · Mon, 23 Aug 2010 15:44:47 -1000

On 08/21/2010 03:32 PM, David S. Ahern wrote:

On 08/20/10 17:24, Zachary Amsden wrote:

On 08/20/2010 03:26 AM, David S. Ahern wrote:

On 08/20/10 02:07, Zachary Amsden wrote:

This patch set implements full TSC virtualization, with both
trapping and passthrough modes, and intelligent mode switching.
As a result, TSC will never go backwards, we are stable against
guest re-calibration attempts, VM reset, and migration.  For guests
which require it, the TSC khz can even be preserved on migration
to a new host.

The TSC will never be trapped on UP systems unless the host TSC
actually runs faster than the guest; other conditions, including
bad hardware and changing speeds are accomodated by using catchup
mode to keep the guest passthrough TSC in line with the host clock.

What's the overhead of trapping TSC reads for Nehalem-type processors?

gettimeofday() in guests is the biggest performance problem with KVM for
me, especially for older OSes like RHEL4 which is a supported OS for
another 2 years. Even with RHEL5, 32-bit, I had to force kvmclock off to
get the VM to run reliably:

http://article.gmane.org/gmane.comp.emulators.kvm.devel/51017/match=kvmclock+rhel5.5

Correctness is the biggest timekeeping problem with KVM for me.  The
fact that you had to force kvmclock off is evidence of that.  Slightly
slower applications are fine.  Broken ones are not acceptable.

I have been concerned with speed and correctness for a while:

http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg02955.html
http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg07231.html

TSC will not be trapped with kvmclock, and the bug you hit with RHEL5
kvmclock has since been fixed.  As you can see, it is not a simple and
straightforward issue to get all the issues sorted out.

kvmclock is for guests running RHEL5.5+some update and or some guest
running a very recent linux kernel. There's a lot of products running on
OS'es older than that.

Also, TSC will not be trapped with UP VMs, only SMP.  If you seriously
believe RHEL4 will perform better as an SMP guest than several instances
of coordinated UP guests, you would worry about this issue.  I don't.
The amount of upstream scalability and performance work done since that
timeframe is enormous, to the point that it's entirely plausible that
KVM governed UP RHEL4 guests as a cluster are faster than a RHEL4 SMP host.

Products built on RHEL3, RHEL4 or earlier RHEL5 were developed in the
past, and performance expectations set for that version based on SMP -
be it bare metal or virtual. You can't expect a product to be redesigned
to run on KVM.

You can expect people to measure and use the system appropriately.  
Products built on RHEL3, 4, etc will not have the inherent SMP 
scalability and therefore won't benefit as hugely from an SMP VM.

So the answer is - it depends.  Hardware is always getting faster, and
trap / exit cost is going down.   Right now, it is anywhere from a few
hundred to multiple thousands of cycles, depending on your hardware.  I
don't have an exact benchmark number I can quote, although in a couple
of hours, I probably will.  I'll guess 3,000 cycles.

I agree, gettimeofday is a huge issue, for poorly written applications.

I understand it is not a simple problem, and "poorly written
applications" is a bit of reach don't you think? There are a number of
workloads that depend on time stamps; that does not make them poorly
designed.

The timestamp will never be unique or completely accurate.  Therefore it 
is not necessary to issue calls to get timestamps from the kernel at an 
extreme rate.

A 64-bit counter and a timestamp fetched approximately once per second 
IS unique and accurate to a 1-second value.

On any virtualized system, unless you have dedicated virtual machines 
and a real time host operating system, you can't really guarantee better 
than 1-second time resolution to any guest.  (Pick some other resolution 
than 1-second; same argument applies).

Therefore, workloads which issue kernel calls to repeatedly get less 
than useful information are in fact, poorly designed, unless they are 
written to run on real time host environments for dedicated RT use.

Not that this means we won't speed it up, in fact, I have already done
quite a bit of work on ways to reduce the exit cost.  Let's, however,
get things correct before trying to make them aggressively fast.

Zach

I have also looked at time keeping and performance of getimeofday on a
certain proprietary hypervisor. KVM lags severely here and workloads
dependent on timestamps are dramatically impacted. Evaluations and
decisions are made today based on current designs - both KVM and
product. Severe performance deltas raise a lot of flags.

This is laughably incorrect.

Gettimeofday is faster on KVM than anything else using TSC based clock 
because it passes the TSC through directly.   VMware traps the TSC and 
is actually slower.

Can you please define your "severe performance delta" and tell us your 
benchmark methodology?  I'd like to help you figure out how it is flawed.

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html