On Wed, Dec 17, 2014 at 08:36:27PM +0100, Radim Krcmar wrote: > 2014-12-17 15:41-0200, Marcelo Tosatti: > > On Wed, Dec 17, 2014 at 03:58:13PM +0100, Radim Krcmar wrote: > > > 2014-12-16 09:08-0500, Marcelo Tosatti: > > > > + tsc_deadline = apic->lapic_timer.expired_tscdeadline; > > > > + apic->lapic_timer.expired_tscdeadline = 0; > > > > + guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc()); > > > > + > > > > + while (guest_tsc < tsc_deadline) { > > > > + int delay = min(tsc_deadline - guest_tsc, 1000ULL); > > > > > > Why break the __delay() loop into smaller parts? > > > > So that you can handle interrupts, in case this code ever moves > > outside IRQ protected region. > > __delay() works only if it is delay_tsc(), which has this handled ... > (It even considers rescheduling with unsynchronized TSC.) > > delay_tsc(delay) translates roughly to > > end = read_tsc() + delay; > while (read_tsc() < end); > > so the code of our while loop has a structure like > > while ((guest_tsc = read_tsc()) < tsc_deadline) { > end = read_tsc() + min(tsc_deadline - guest_tsc, 1000); > while (read_tsc() < end); > } > > which complicates our original idea of > > while (read_tsc() < tsc_deadline); > > (but I'm completely fine with it.) True. I can change to a direct wait if that is preferred. > > > > + __delay(delay); > > > > > > (Does not have to call delay_tsc, but I guess it won't change.) > > > > > > > + guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc()); > > > > + } > > > > } > > > > > > > > > > Btw. simple automatic delta tuning had worse results? > > > > Haven't tried automatic tuning. > > > > So what happens on a realtime environment is this: you execute the fixed > > number of instructions from interrupt handling all the way to VM-entry. > > > > Well, almost fixed. Fixed is the number of apic_timer_fn plus KVM > > instructions. You can also execute host scheduler and timekeeping > > processing. > > > > In practice, the length to execute that instruction sequence is a bell > > shaped normal distribution around the average (the right side is > > slightly higher due to host scheduler and timekeeping processing). > > > > You want to advance the timer by the rightmost bucket, that way you > > guarantee lower possible latencies (which is the interest here). > > (Lower latencies would likely be achieved by having a timer that issues > posted interrupts from another CPU, and the guest set to busy idle.) Yes. > > That said, i don't see advantage in automatic tuning for the usecase > > which this targets. > > Thanks, it doesn't make much difference in the long RT setup checklist. Exactly. > --- > I was asking just because I consider programming to equal automation ... > If we know that we will always set this to the rightmost bucket anyway, > it could be done like this > > if ((s64)(delta = guest_tsc - tsc_deadline) > 0) > tsc_deadline_delta += delta; > ... > advance_ns = kvm_tsc_to_ns(tsc_deadline_delta); > > instead of a script that runs a test and sets the variable. > (On the other hand, it would probably have to be more complicated to > reach the same level of flexibility.) You'd have to guarantee the vcpus are never interrupted by other work, such as processing host interrupts, otherwise you could get high increments for tsc_deadline_delta. So to tune that value you do: 1) Boot guest. 2) Setup certain vCPUs as realtime (large checklist), which includes pinning and host interrupt routing. 3) Measure with cyclictest on those vCPUs with the realtime conditions. So its also a matter of configuration. But yes the code above would set advance_ns to the rightmost bucket. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html