Hello, I'm trying to learn how the Performance Monitoring Unit (PMU) works in Intel Core x86 systems. For example, I want to create a kernel module that counts the number of CPU cycles of a process, the PID of which is provided as a parameter. The same thing can be done by using perf: perf stat -e cycles ./testProgram The module init function has the following program flow: 1. I register a nmi handler function 'pmc_handler' to detect when a counter overflows. apic_write(APIC_LVTPC, APIC_DM_NMI); register_nmi_handler(NMI_LOCAL, pmc_handler, 0, "perf_handler"); 2. For every cpu, I write 0x0 to various MSRs. for_each_online_cpu(cpu) { wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_CTRL, 0x0); wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_OVF_CTRL, 0x0); wrmsr64_safe_on_cpu(cpu, IA32_PERF_FIXED_CTR_CTRL, 0x0); wrmsr64_safe_on_cpu(cpu, IA32_PMC0, 0x0); wrmsr64_safe_on_cpu(cpu, IA32_PERFEVTSEL0, 0x0); } 3. For every online cpu: a. Set the 0th and 62nd bit of IA32_PERF_GLOBAL_OVF_CTRL to set overflow bit. wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_OVF_CTRL, (0x1 << 0) | ((u64) 0x1 << 62)); b. Program IA32_PERFEVTSEL0 such that it measures unhalted CPU cycles with interrupts and overflows enabled in user and os mode. wrmsr64_safe_on_cpu(cpu, IA32_PERFEVTSEL0, (u64) INST_UNHALTED | INT_ENABLE | COUNTER_ENABLE | USR_MODE | OS_MODE); c. Write (u64)-999 to the IA32_PMC0 counter. wrmsr64_safe_on_cpu(cpu, IA32_PMC0, counterVal); I'm not sure how step 3c works. Probably, it is done to make sure that the counter does overflow more often. I could really use some explanation on this part. Since the module is supposed to count CPU cycles of a particular process, we need to know when the process is running and when it is waiting. To get that information, I have added a hook in the __schedule() function in kernel/sched/core.c. To know when a process terminates, I've added a hook in do_exit() function of kernel/exit.c. Now whenever our process is scheduled to a CPU, I enable the PMU counters by writing 0x1 to IA32_PERF_GLOBAL_CTRL on every cpu. Similarly, whenever our process is scheduled out, I disable the PMU counters by writing 0x0 to IA32_PERF_GLOBAL_CTRL. This setup allows to count CPU cycles of only the process we are interested in. At this point we have a system where the counters will be enabled when our process is running on a CPU. Also, whenever the counter overflows, the pmc_handler function is called (because that's what we did in Step 1). In the pmc_handler function, I do the following: 1. Disable the counters temporarily before reading from them. wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_CTRL, 0x0); 2. Increase total count by the value of the counter. totalCount += read_msrs_on_cpu(cpu, IA32_PMC0); 3. Enable counters again. wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_CTRL, 0x1); When the process terminates, i.e. when the exit hook is called for our process, I output the value of totalCount. This setup seems correct to me but the value of totalCount and the value what perf gives me are vastly different. (56,000 [my module] v/s 700,000 [perf]) (For perf I'm reading r003c which is the same as INST_UNHALTED(0x003c) as per Intel Core performance manual. ) Can anyone please help me with this? I've been stuck at this for quite some time. I suspect there could be a conceptual flaw in the whole setup. Thanks and regards, Anubhav Sharma (Kernel Novice) _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies