Hollis Blanchard wrote:
On Tue, 2008-11-11 at 16:43 +0100, Christian Ehrhardt wrote:
From: Christian Ehrhardt <ehrhardt@xxxxxxxxxxxxxxxxxx>
*update to v4*
- EMUL_CORE no longer had more than wrtee emulation, therefore it now accounts
for WRTEE in the output and set_exit_type calls are in the wrtee handlers to
let any residual core op be counted as EMULINST"
*update to v3*
- ensure build time optimization when calling exit accouting functions using
build time bug / constant check
- migrate most of the exit timing code from powerpc.c and
kvm_timing_stats.h to a separate exittiming.c file
- renamed a lot of constants used to better fit generic/core specific code
- added an accidenially removed optimization comment
- use pid of userspace process instead of an own atomic count to identify a vm
- changed loop labels in tul/tbu loops (booke_interrupt.S) to anonymous 1/1b
- removed the indirection of additional EMULATE_*_DONE types. Instead now
the exit timing supports "accounting" an exit which consists of set type and
increase kvm_stat counters. But also provides both sub-tasks as separate
functions. Using that emulation now sets a default EMUL_INST type that can
be overwritten by detailed subcategories (set_exit_type). Accouting for
kvm_stat is done with account_exit_stat for all emulinst exits together on
top level (as it was before).
*update to v2*
The update fixes accounting for sets to MSR[WE] which should not be accoutned
as instruction emulation. While adding that and analyzing the data it became
obvious that several types of emulations hould be accounted separately.
I'm not yet really happy with adding all these EMULATE_*_DONE types but I had
no better idea to not break flood the code with account calls and split
account/set_type. The issue is that emulation is also called e.g. for some mmio
paths and therefore the accouting should not be called inside emulation, but on
the returning path that can now evaluate the different kind of emulations done.
This is not yet finished, and posted as RFC only.
Other existing kvm statistics are either just counters (kvm_stat) reported for
kvm generally or trace based aproaches like kvm_trace.
For kvm on powerpc we had the need to track the timings of the different exit
types. While this could be achieved parsing data created with a kvm_trace
extension this adds too muhc overhead (at least on embedded powerpc) slowing
down the workloads we wanted to measure.
Therefore this patch adds a in kernel exit timing statistic to the powerpc kvm
code. These statistic is available per vm&vcpu under the kvm debugfs directory.
As this statistic is low, but still some overhead it can be enabled via a
.config entry and should be off by default.
Since this patch touched all powerpc kvm_stat code anyway this code is now
merged and simpliefied together with the exit timing statistic code (still
working with exit timing disabled in .config).
You forgot to include exittiming.c in this patch. Since I can't build
it, I might as well tell you the additional changes I was going to
make... :)
sigh - I hate missing such things. I should include a check via hg
status for ? files with a .c/.h type in my hg email :-/
I'll resubmit the patch as v5 including the changes you describe here
* Prefix all the functions in kvm_timing_stats.h to begin with
"kvmppc_". (I really like to keep the layering as clear as
possible.) I don't think we need to add "booke" in there, since
a hypothetical kvmppc 970 implementation could use the same
function names, just a different set of exit types.
very reasonable, and as some sort of advocacy I wanted to mention that I
already did that with some of the functions.
But yes I should be consequent and do this for all of them.
* Rename kvm_timing_stats.h. Does that need to go in the asm
directory btw? If so, call it kvm_timing.h; if not, please put
it inside arch/powerpc/kvm, and name it timing.h.
done
* Rename exittiming.c to match whatever you name the header.
done
* Use empty static functions instead of empty macros in
kvm_timing.h. (I'm not the only one who doesn't like macros; see
http://lwn.net/Articles/306045/)
I had no precedence 'til now and just used what I see most in code and
that where defines :-)
But it makes sense.
Btw - I would even change it if only you wouldn't like macros :-P
* I know it's just leftovers from previous iterations of this
patch, but drop the whitespace changes to include/asm/kvm_ppc.h
(and send as a separate patch if you like).
at least the whitespace fix is correct ;-)
I'll submit it in a separate patch
* Remove vm_id from kvm_arch (you just missed this one spot :).
done
* I don't like the debugfs file names you've chosen, but I'm not
sure what's best. Definitely make them lowercase and omit the
underscore, but the directory layout feels a little odd to me.
Maybe it should be /sys/kernel/debug/kvm/exit_times/vm52/vcpu0/
instead of /sys/kernel/debug/kvm/vm52/vcpu0/exit_times ?
btw it's no more a directory layout anymore - just a flat file
(explained below).
I can make them lower case, but the dir version is not what you want.
I had that in the past and it works nice as long as things go fine.
But I found that when you (intentionally or not) crash the process (e.g.
kill -9) the cleanup in the kvm code cleans up the vm before the vcpu's.
While initialization is vm -> vcpu.
This gives you
>vm
>vcpu's
<vm
<vcpu's
Eventually this forces you in your cleanup of the VM to iterate over the
vcpu's in your own loop, because otherwise you cant remove the vm
directory because you have stale files in it. And once the vcpu's are
cleaned the vm clean will no more be called and you have a stale dir.
However I know this could be solved, but all would add loops or
refcount/childencount checks, ...
While the simple approach to handle it in one file is just better to
read in code and less bug prone.
I changed it to lower case with only a underscore between vm/vcpu/timing
no more between entity and their numbers (vcpu_1 -> vcpu1)
* Remove atomic.h and seq_file.h from arch/powerpc/kvm/powerpc.c
done and removed debugfs.h as c cod is now in a separate file.
* Slight edits for Kconfig:
config KVM_EXIT_TIMING
bool "Detailed exit timing"
depends on KVM
---help---
Calculate elapsed time for every exit/enter cycle. A per-vcpu
report is available in debugfs kvm/vm###/vcpu###_exit_timing.
The overhead is relatively small, however it is not recommended for
production environments.
If unsure, say N.
included with a modification to the debugfs layout where the data can be
found
A lot of the naming stuff is just personal preference, but in general I
feel like the names you chose are too verbose and inconsistent. I've
found that name changes are easiest to do in the unapplied patch itself
with global find/replace.
I'm happy with the rest of what I can see. :) If I have comments after
you've left for vacation, I'll apply those on top as a separate patch.
This patch is really important, and I feel like we're almost there...
I hope it now feels even closer
Thanks very much for http://kvm.qumranet.com/kvmwiki/PowerPC_Exittimings
too; that's critical data.
I hope to get blktrace data onto that pages today too.
Thanks for your review Hollis, you should find the updated patch in your
inbox and on kvm-ppc soon.
--
Grüsse / regards,
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html