Re: exit timing analysis v1 - comments&discussions welcome

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hollis Blanchard wrote:
On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote:
Wondering about that 30.5% for postprocessing and kvmppc_check_and_deliver_interrupts I quickly checked that in detail - part d is now divided in 4 subparts. I also looked at the return to guest path if the expected part (restoring tlb) is really the main time eater there. The result shows clearly that it is.

more detailed breakdown:
a)  10.94%  - exit, saving guest state (booke_interrupt.S)
b)   8.12% - reaching kvmppc_handle_exit
c) 7.59% - syscall exit is checked and a interrupt is queued using kvmppc_queue_exception
d1)  3.33%  - some checks for all exits
d2)  8.29% - finding first bit in kvmppc_check_and_deliver_interrupts
d3) 17.20% - can_deliver/clear&deliver exception in kvmppc_check_and_deliver_interrupts
d4)  4.47% - updating kvm_stat statistics
e)   6.13% - returning from kvmppc_handle_exit to booke_interrupt.S
f1) 29.18% - restoring guest tlb
f2)  4.69% - restoring guest state ([s]regs)

These fractions are % of our ~12µs syscall exit.
=> restoring tlb on each reenter = 4µs constant overhead
=> looking a bit into irq delivery and other constant things like kvm_stat updating

...
Now I go for the TLB replacement in f1.

Hang on... does d3 make sense to you? It doesn't to me, and if there's a
bug there it will be easier to fix than rewriting the TLB code. :)
I did not give up improving that part too :-)
I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20%
of 12µs is 2064ns, or about 1300 cycles. (Check my math.)
I get the same results. 1% ~ 80 cycles.
Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where
that time is going. We're assuming the first_first_bit() loop usually
executes once, for syscall. Does it actually execute more than that? I
don't expect any of kvmppc_can_deliver_interrupt(),
kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to
take lots of time.
You can see below that I already had a more detailed breakdown in my old mail:
[...]
d2) 8.84% - 8.56% - 9.28% - 8.31% finding first bit in kvmppc_check_and_deliver_interrupts d3) 6.53% - 5.25% - 6.63% - 5.10% can_deliver in kvmppc_check_and_deliver_interrupts d4) 13.66% - 15.37% - 14.12% - 14.92% clear&deliver exception in kvmppc_check_and_deliver_interrupts
[...]
Could it be cache effects? exception_priority[] and priority_exception[]
are 16 bytes each, and our L1 cacheline is 32 bytes, so they should both
fit into one... except they're not aligned.
I would be so happy if I would have hardware performance counters like cache misses :-)
Also, it looks like we use the generic find_first_bit(). That may be
more expensive than we'd like. However, since
vcpu->arch.pending_exceptions is a single long (not an arbitrary sized
bitfield), we should be able to use ffs() instead, which has an
optimized PowerPC implementation. That might help a lot.
good idea.
I'll check this and some other small improvements I have in mind.

We might even be able to replace find_next_bit() too, by shifting a mask
over each loop, but I don't think we'll have to, since I expect the
common case to be we can deliver the first pending exception. (Worth
checking? :)
I'm not sure. It's surely worth checking how often that second find_next_bit is called.
If that number is far too small it's not worth.

--

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM Development]     [KVM ARM]     [KVM ia64]     [Linux Virtualization]     [Linux USB Devel]     [Linux Video]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux