On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > > > >> Excellent, support for guest kernel != host kernel is critical (I can't > >> remember the last time I ran same kernels). > >> > >> How would we support multiple guests with different kernels? > >> > > With the patch, 'perf kvm report --sort pid" could show > > summary statistics for all guest os instances. Then, use > > parameter --pid of 'perf kvm record' to collect single problematic instance data. > > > > That certainly works, though automatic association of guest data with > guest symbols is friendlier. Thanks. Originally, I planed to add a -G parameter to perf. Such like -G 8888:/XXX/XXX/guestkallsyms:/XXX/XXX/modules,8889:/XXX/XXX/guestkallsyms:/XXX/XXX/modules 8888 and 8889 are just qemu guest pid. So we could define multiple guest os symbol files. But it seems ugly, and 'perf kvm report --sort pid" and 'perf kvm top --pid' could provide similar functionality. > > >>> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c > >>> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 > >>> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 > >>> @@ -26,6 +26,7 @@ > >>> #include<linux/sched.h> > >>> #include<linux/moduleparam.h> > >>> #include<linux/ftrace_event.h> > >>> +#include<linux/perf_event.h> > >>> #include "kvm_cache_regs.h" > >>> #include "x86.h" > >>> > >>> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct > >>> vmcs_write32(TPR_THRESHOLD, irr); > >>> } > >>> > >>> +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; > >>> + > >>> +static void kvm_set_in_guest(void) > >>> +{ > >>> + percpu_write(kvm_in_guest, 1); > >>> +} > >>> + > >>> +static int kvm_is_in_guest(void) > >>> +{ > >>> + return percpu_read(kvm_in_guest); > >>> +} > >>> > >>> > >> > > > >> There is already PF_VCPU for this. > >> > > Right, but there is a scope between kvm_guest_enter and really running > > in guest os, where a perf event might overflow. Anyway, the scope is very > > narrow, I will change it to use flag PF_VCPU. > > > > There is also a window between setting the flag and calling 'int $2' > where an NMI might happen and be accounted incorrectly. > > Perhaps separate the 'int $2' into a direct call into perf and another > call for the rest of NMI handling. I don't see how it would work on svm > though - AFAICT the NMI is held whereas vmx swallows it. > I guess NMIs > will be disabled until the next IRET so it isn't racy, just tricky. I'm not sure if vmexit does break NMI context or not. Hardware NMI context isn't reentrant till a IRET. YangSheng would like to double check it. > > >>> +static struct perf_guest_info_callbacks kvm_guest_cbs = { > >>> + .is_in_guest = kvm_is_in_guest, > >>> + .is_user_mode = kvm_is_user_mode, > >>> + .get_guest_ip = kvm_get_guest_ip, > >>> + .reset_in_guest = kvm_reset_in_guest, > >>> +}; > >>> > >>> > >> Should be in common code, not vmx specific. > >> > > Right. I discussed with Yangsheng. I will move above data structures and > > callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to > > kvm_x86_ops. > > > > You will need access to the vcpu pointer (kvm_rip_read() needs it), you > can put it in a percpu variable. We do so now in a new patch. > I guess if it's not null, you know > you're in a guest, so no need for PF_VCPU. Good suggestion. Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html