Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration

Radim Krcmar <rkrcmar@xxxxxxxxxx> · Wed, 17 Dec 2014 20:36:27 +0100

2014-12-17 15:41-0200, Marcelo Tosatti:
> On Wed, Dec 17, 2014 at 03:58:13PM +0100, Radim Krcmar wrote:
> > 2014-12-16 09:08-0500, Marcelo Tosatti:
> > > +	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
> > > +	apic->lapic_timer.expired_tscdeadline = 0;
> > > +	guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc());
> > > +
> > > +	while (guest_tsc < tsc_deadline) {
> > > +		int delay = min(tsc_deadline - guest_tsc, 1000ULL);
> > 
> > Why break the __delay() loop into smaller parts?
> 
> So that you can handle interrupts, in case this code ever moves
> outside IRQ protected region.

__delay() works only if it is delay_tsc(), which has this handled ...
(It even considers rescheduling with unsynchronized TSC.)

delay_tsc(delay) translates roughly to

  end = read_tsc() + delay;
  while (read_tsc() < end);

so the code of our while loop has a structure like

  while ((guest_tsc = read_tsc()) < tsc_deadline) {
    end = read_tsc() + min(tsc_deadline - guest_tsc, 1000);
    while (read_tsc() < end);
  }

which complicates our original idea of

  while (read_tsc() < tsc_deadline);

(but I'm completely fine with it.)

> > > +		__delay(delay);
> > 
> > (Does not have to call delay_tsc, but I guess it won't change.)
> > 
> > > +		guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc());
> > > +	}
> > >  }
> > >  
> > 
> > Btw. simple automatic delta tuning had worse results?
> 
> Haven't tried automatic tuning.
> 
> So what happens on a realtime environment is this: you execute the fixed
> number of instructions from interrupt handling all the way to VM-entry.
> 
> Well, almost fixed. Fixed is the number of apic_timer_fn plus KVM
> instructions. You can also execute host scheduler and timekeeping 
> processing.
> 
> In practice, the length to execute that instruction sequence is a bell
> shaped normal distribution around the average (the right side is
> slightly higher due to host scheduler and timekeeping processing).
> 
> You want to advance the timer by the rightmost bucket, that way you
> guarantee lower possible latencies (which is the interest here).

(Lower latencies would likely be achieved by having a timer that issues
 posted interrupts from another CPU, and the guest set to busy idle.)

> That said, i don't see advantage in automatic tuning for the usecase 
> which this targets.

Thanks, it doesn't make much difference in the long RT setup checklist.

---
I was asking just because I consider programming to equal automation ...
If we know that we will always set this to the rightmost bucket anyway,
it could be done like this

  if ((s64)(delta = guest_tsc - tsc_deadline) > 0)
    tsc_deadline_delta += delta;
  ...
  advance_ns = kvm_tsc_to_ns(tsc_deadline_delta);

instead of a script that runs a test and sets the variable.
(On the other hand, it would probably have to be more complicated to
 reach the same level of flexibility.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html