Re: vtime accounting

Christoffer Dall <cdall@xxxxxxxxxx> · Tue, 14 Mar 2017 19:39:13 +0100

On Tue, Mar 14, 2017 at 05:58:59PM +0100, Radim Krčmář wrote:
> 2017-03-14 09:26+0100, Christoffer Dall:
> > On Mon, Mar 13, 2017 at 06:28:16PM +0100, Radim Krčmář wrote:
> >> 2017-03-08 02:57-0800, Christoffer Dall:
> >> > Hi Paolo,
> >> > 
> >> > I'm looking at improving KVM/ARM a bit by calling guest_exit_irqoff
> >> > before enabling interrupts when coming back from the guest.
> >> > 
> >> > Unfortunately, this appears to mess up my view of CPU usage using
> >> > something like htop on the host, because it appears all time is spent
> >> > inside the kernel.
> >> > 
> >> > From my analysis, I think this is because we never handle any interrupts
> >> > before enabling interrupts, where the x86 code does its
> >> > handle_external_intr, and the result on ARM is that we never increment
> >> > jiffies before doing the vtime accounting.
> >> 
> >> (Hm, the counting might be broken on nohz_full then.)
> >> 
> > 
> > Don't you still have a scheduler tick even with nohz_full and something
> > that will eventually update jiffies then?
> 
> Probably, I don't understand jiffies accounting too well and didn't see
> anything that would bump the jiffies in or before guest_exit_irqoff().
> 

As far as I understand, from my very very short time of looking at the
timer code, jiffies are updated on every tick, which can be cause by a
number of events, including *any* interrupt handler (coming from idle
state), soft timers, timer interrupts, and possibly other things.

> >> > So my current idea is to increment jiffies according to the clocksource
> >> > before calling guest_exit_irqoff, but this would require some main
> >> > clocksource infrastructure changes.
> >> 
> >> This seems similar to calling the function from the timer interrupt.
> >> The timer interrupt would be delivered after that and only wasted time,
> >> so it might actually be slower than just delivering it before ...
> > 
> > That's assuming that the timer interrupt hits at every exit.  I don't
> > think that's the case, but I should measure it.
> 
> There cannot be less vm exits and I think there are far more vm exits,
> but if there was no interrupt, then jiffies shouldn't raise and we would
> get the same result as with plain guest_exit_irqoff().
> 

That's true if you're guaranteed to take the timer interrupts that
happen while running the guest before hitting guest_exit_irqoff(), so
that you eventually count *some* time for the guest.  In the arm64 case,
if we just do guest_exit_irqoff(), we *never* account any time to the
guest.

> >> How expensive is the interrupt enable/disable cycle that this
> >> optimization saves?
> > 
> > I'll have to go back and measure this bit specifically again, but I
> > recall it being a couple of hundred cycles.  Not alariming, but
> > worthwhile looking into.
> 
> Yeah, sounds good.
> 
> >> > My question is: how important is the vtime accounting on the host from
> >> > your point of view?
> >> 
> >> No idea.  I'd keep the same behavior on all architectures, though.
> >> 
> >> The precision of accounting is in jiffies (millions of cycles), so we
> >> could maybe move it from the hot path to vcpu_load/put(?) without
> >> affecting the count in usual cases ...
> >> 
> > 
> > So since sending my original e-mail I found out that the vtime
> > accounting logic was changed from ktime to jiffies, which is partly why
> > we're having problems on arm.  See:
> > 
> > ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
> > sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
> > 
> > Moving to load/put depends on the semantics of this vtime thing.  Is
> > this counting cycles spent in the VM as opposed to in the host kernel
> > and IRQ handling, and is that useful for system profiling or scheduling
> > decisions, in which case moving to vcpu_load/put doesn't work...
> 
> Right.
> 
> > I assume there's a good reason why we call guest_enter() and
> > guest_exit() in the hot path on every KVM architecture?
> 
> I consider myself biased when it comes to jiffies, so no judgement. :)
> 
> From what I see, the mode switch is used only for statistics.
> The original series is
> 
> 5e84cfde51cf303d368fcb48f22059f37b3872de~1..d172fcd3ae1ca7ac27ec8904242fd61e0e11d332
> 
> It didn't introduce the overhead with interrupt window and it didn't
> count host kernel irq time as user time, so it was better at that time.

Yes, but it was based on cputime_to... functions, which I understand use
ktime, which on systems running KVM will most often read the clocksource
directly from the hardware, and that was then optimized later to just
use jiffies to avoid the clocksource read because jiffies is already in
memory and adjusted to the granularity we need, so in some sense an
improvement, only it doesn't work if you don't update jiffies when
needed.

Thanks,
-Christoffer