Hi Daniel,
thanks for confirming that I'm on the right way. But I still
experience problems with a heavily stressed node. Let me first
explain my current node setup:
<xm info>
release : 2.6.18-1.2835.slc4xen
version : #1 SMP Wed Nov 29 21:05:58 CET 2006
machine : i686
nr_cpus : 2
nr_nodes : 1
sockets_per_node : 2
cores_per_socket : 1
threads_per_core : 1
cpu_mhz : 2800
total_memory : 2047
xen_major : 3
xen_minor : 0
xen_extra : .3-rc5-1.2835.s
xen_caps : xen-3.0-x86_32p
xen_pagesize : 4096
</xm info>
<xm vcpu-list>
Name ID VCPUs CPU State Time(s) CPU Affinity
Domain-0 0 0 0 r-- 9761.7 any cpu
Domain-0 0 1 1 --- 10571.9 any cpu
stornode 2 0 1 r-- 7287.6 1
stornode 2 1 1 --- 6473.1 1
worknode 3 0 0 --- 2139.3 0
worknode 3 1 0 --- 1368.2 0
worknode 3 2 0 --- 1223.1 0
worknode 3 3 0 --- 1349.5 0
</xm vcpu-list>
I'm running on all domains on every virtual cpu a cpu stress tool,
called cpuburn. Now I let my small sensor calculate the cpu
utilisation of the whole node. I calculate the cpu utilisation for
each domain, one after another, and then sum up the results to the
node value.
In the described stress situation it tooks about an average of 4
seconds to make the following to function calls, which provide me the
cpuTime of a domain
dom_old = virDomainLookupByID(conn_old, listOfDomains[i]);
ret = virDomainGetInfo(dom_old, &info_old);
Here are the stats from my latest measurement:
old cpuTime new cpuTime
Domain-0: 3s 4294835190ms 3s 4294849513ms
stornode: 5s 580501ms 6s 4294550691ms
worknode: 6s 4294546809ms 5s 582761ms
That leads to results in cpu utilisation computation for the node,
which are much lower, around 75%, than the real value (100%) would be.
One solution would be to add the measured time make those calls to
used cpuTime. But this in turn can cause calculations of to high
values because I don't really know in which point in time the value
is written to the structure.
Nevertheless is xentop showing me every time the correct cpu-
utilisation of each of my domains. So that I conclude, that this
problem must have something to do with libvirt API.
Do you ore does anybody else experienced similar issues? Do you know
any solution to that?
Cheers,
Jan
On 10.05.2007, at 18:32, Daniel P. Berrange wrote:
On Thu, May 10, 2007 at 05:41:33PM +0200, Jan Michael wrote:
Hi everyone,
using libvirt I'm trying to calculate cpu utilization of a node in
percent. But sometimes values beyond 100.0% are being calculated.
This is because a domain spend more time on a cpu than time is
elapsed in the meantime.
A short explanation of the way how cpu utilization is computed in my
case:
1. - open two connections with
conn_cur/conn_old = virConnectOpenReadOnly(NULL);
2. - get current time
gettimeofday(&time_old, NULL);
- get domain by id with
dom_old = virDomainLookupByID(conn_old, id)
- get domain information
virDomainGetInfo(dom_old, &info_old);
3. - sleep a second
4. - doing same stuff like in 2. but with _cur
5. - compute cpu utilization by dividing used cputime by elapsed
time
and multiply with 100
Am I right if I suppose that cpuTime for _virDomainInfo structure
will be directly acquired from the hypervisor in virDomainGetInfo
(dom_old, &info_old) or is it already present with getting the domain
itself? Is there any better solution of doing this, which is more
precise?
This is the best approach - the algorithm you summarized is basically
the same as I use in virt-manager. The reason it sometimes goes above
100% is just due to timing / schedular variations
1. get timeofday
2. get cputime for domA
3. sleep a while
4. get timeofday
5. get cputime for domA
We're basically looking at the ratio of 4-1, against 5-2. It would
be 100% accurate if you could guarentee no time elapased between
steps 1 & 2, or between steps 4 & 5, but there's always some latency
in there, so occassionally you might end up calculating a value that
is a tiny bit over 100%. In virt-manager I deal with this by simply
rounding down to 100 if this occurs.
Based on the hypercalls which are available to us, I don't see any
way to avoid this scenario. Then again it is not like we really need
millisecond precision in caculating CPU usage so I don't think its
a problem worrying about too much.
And another general question:
The monitoring utility of xen, called xentop, provides also
statistics about networking and vbds. Are there any plans to provide
this values by libvirt in the future?
I'd like to see the ability to track network & disk I/O stats.
No one has so far stepped forward to suggest an API or implmentation,
but I'd welcome anyone interested in taking a look at this area.
Regards,
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978
392 2496 -=|
|=- Perl modules: http://search.cpan.org/
~danberr/ -=|
|=- Projects: http://freshmeat.net/
~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B
9505 -=|