Re: [RFC PATCH] KVM: x86: Skip request checking branches in vcpu_enter_guest() more effectively

Avi Kivity <avi@xxxxxxxxxx> · Mon, 24 Sep 2012 16:50:13 +0200

On 09/24/2012 04:14 PM, Takuya Yoshikawa wrote:
> On Mon, 24 Sep 2012 12:18:15 +0200
> Avi Kivity <avi@xxxxxxxxxx> wrote:
> 
>> On 09/24/2012 08:24 AM, Takuya Yoshikawa wrote:
>> > This is an RFC since I have not done any comparison with the approach
>> > using for_each_set_bit() which can be seen in Avi's work.
>> > 
>> > 	Takuya
>> > ---
>> > 
>> > We did a simple test to see which requests we would get at the same time
>> > in vcpu_enter_guest() and got the following numbers:
>> > 
>> > |...........|...............|........|...............|.|
>> >      (N)           (E)         (S)          (ES)      others
>> >      22.3%         30.7%       16.0%        29.5%     1.4%
>> > 
>> > (N) : Nothing
>> > (E) : Only KVM_REQ_EVENT
>> > (S) : Only KVM_REQ_STEAL_UPDATE
>> > (ES): Only KVM_REQ_EVENT and KVM_REQ_STEAL_UPDATE
>> > 
>> > * Note that the exact numbers can change for other guests.
>> 
>> What was the workload?  STEAL_UPDATE is done on schedules and
>> heavyweight exit (erronously), so it should be rare.
> 
> Just ping'ing to the host.  But even without that, I got many
> STEAL_UPDATE's and KVM_REQ_EVENT's.

Those are only relative measurements.  Sure on an idle guest practically
all of your exits are special, but there's so few of them that
optimizing them doesn't matter much.

However the trend is that as hardware improves exits get more and more
special.

Before NPT/EPT, most of the exits were page faults, and we optimized for
them.

Afterwards, most exits are APIC and interrupt related, HLT, and MMIO.
Of these, some are special (HLT, interrupt injection) and some are not
(read/write most APIC registers).  I don't think one group dominates the
other.  So already vcpu->requests processing is not such a slow path, it
is relatively common.  We still see a lot of page faults during boot and
during live migration though.

With AVIC/APIC-V (still in the future) the mix will change again, with
both special and non-special exits eliminated.  We'll be left mostly
with APIC timer and HLT (and ICR writes for APIC-V).

So maybe the direction of your patch makes sense.  Things like
KVM_REQ_EVENT (or anything above 2-3% of exits) shouldn't be in
vcpu->requests or maybe they deserve special treatment.

> 
>> Or maybe we're recording HLT time as steal time?
> 
> Not sure.

I think we do, but on the other hand qemu doesn't even expose it so we
can't measure it.

> 
> BTW, schedule() is really rare?  We do either cond_resched() or
> heavy weight exit, no?

If 25% of exits are HLT (like a ping workload), then 25% of your exits
end up in schedule().

On modern hardware, a relatively larger percentage of exits are
heavyweight (same analysis as above).  On AVIC hardware most exits will
be mmio, HLT, and host interrupts.  Of these only host interrupts that
don't lead to a context switch will be lightweight.

> 
> I always see vcpu threads actively move around the cores.
> (When I do not pin them.)

Sure, but the frequency is quite low.  If not that's a bug.

> 
>> > This motivated us to optimize the following code in vcpu_enter_guest():
>> > 
>> >     if (vcpu->requests) { /** (1) **/
>> >         ...
>> >         if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) /** (2) **/
>> >             record_steal_time(vcpu);
>> >         ...
>> >     }
>> > 
>> >     if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> >         ...
>> >     }
>> > 
>> >  - For case (E), we do kvm_check_request() for every request other than
>> >    KVM_REQ_EVENT in the block (1) and always get false.
>> >  - For case (S) and (ES), the only difference from case (E) is that we
>> >    get true for KVM_REQ_STEAL_UPDATE.
>> > 
>> > This means that we were wasting a lot of time for the many if branches
>> > in the block (1) even for these trivial three cases which dominated more
>> > than 75% in total.
>> > 
>> > This patch mitigates the issue as follows:
>> >  - For case (E), we change the first if condition to
>> >        if (vcpu->requests & ~(1 << KVM_REQ_EVENT)) /** (1') **/
>> >    so that we can skip the block completely.
>> >  - For case (S) and (ES), we move the check (2) upwards, out of the
>> >    block (1), to clear the KVM_REQ_STEAL_UPDATE flag before doing the
>> >    check (1').
>> > 
>> > Although this adds one if branch for case (N), the fact that steal
>> > update occurs frequently enough except when we give each vcpu a
>> > dedicated core justifies its tiny cost.
>> 
>> Modern processors will eliminate KVM_REQ_EVENT in many cases, so the
>> optmimization is wasted on them.
> 
> Then, my Nehalem server was not so modern.

Well I was referring to APIC-v/AVIC hardware which nobody has.  On
current hardware they're very common.  So stuffing it in the
vcpu->requests slow path is not warranted.

My patch is cleaner than yours as it handles the problem generically,
but yours matches reality better.

> I did something like this:
> 
> 	if requests == KVM_REQ_EVENT
> 		++counter1;
> 	if requests == KVM_REQ_STEAL_UPDATE
> 		++counter2;
> 	...
> 
> in vcpu_enter_guest() and saw KVM_REQ_EVENT many times.

(in theory perf probe can do this.  But figuring out how is often more
time consuming than patching the kernel).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html