Re: CPU Utilization

Rene Herman <rene.herman@xxxxxxxxxxxx> · Sat, 26 Apr 2008 00:13:29 +0200

On 25-04-08 22:18, Rafael Almeida wrote:

I want to know how does the kernel extract CPU utilization info from my
x86 CPU. I figure this is the place where it gets the data: 
http://lxr.linux.no/linux/fs/proc/proc_misc.c#L441 but I don't know where
it gets the data from. I've tried following the kstat_cpu link, but it
didn't get me to anywhere I found useful or understandable.

The trouble will be understanding per-CPU data.

If you access global data on an SMP system (or a preemptive system but let's 
stick with SMP for this discussion) you no doubt know that you need to be 
careful about atomicity -- you need locking around the access if the access 
itself isn't inherently atomic to begin with. This makes a CPU wait around 
until another CPU is done before going ahead with the access but this does 
ofcourse waste time and can mean a real bottleneck when it's a frequently 
accessed piece of global data.

Enter per-CPU data, which means each CPU gets its own, private instance of 
the same global variable which it can then access without any locking needed 
and without keeping other CPUs from going ahead and accessing _their_ own 
private instance as well (again and as usual, PREEMPT counts as SMP here). 
Only when you need to, you then pull all the per-CPU instances of that same 
variable together; generally it's a counter and the pulling together 
consists of adding them for a global total.

This is what is happening here. Each CPU keeps a struct kernel_stat as a 
per-CPU variable. In essence it's just a global:

(*)	struct kernel_stat kstat[NR_CPUS];

The fact that in practice it's (much) less straight forward than this is 
just an optimization. The linker gets involved to put per-CPU data in its 
own section so that it can be laid out so as to have the different CPU 
instances of a single variable in different CPU cache lines. If for example 
the instances for CPU 0 and 1 would share a cacheline then each time CPU 0 
would update its instance, CPU 1 would have its own instance in a stale 
cacheline even though its instance _itself_ was still perfectly fine. This 
slows down things a lot; CPU cache optimizations are without a doubt the 
most important optimizations in modern computer systems due to the huge 
speed penalty of cache misses

Let's just pretend it's as simple as (*) though and that we have:

#define per_cpu(kstat, cpu) kstat[cpu]

#define kstat_cpu(cpu) per_cpu(kstat, cpu) as now and you'll be able to make 
more sense of things: kstat_cpu(i) refers to the i'th CPUs kernel_stat 
structure.

Moreover, kstat_cpu(i).cpustat refers to the struct cpu_usage_stat that's 
embedded in kernel_stat and this is where the scheduler keeps track of 
where/how the CPU spent its time. See specifically account_user_time() and 
account_system_time() in kernel/sched.c.

The basic answer therefore is "the scheduler keeps track of this" (details 
are in the code) and only the per-CPU stuff makes it a little obscure.

Hope this helps.

Rene.

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ