On 01/18/2012 12:12 AM, Lai Jiangshan wrote: > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> > --- > include/libvirt/libvirt.h.in | 5 ++++ > python/generator.py | 1 + > src/driver.h | 7 +++++ > src/libvirt.c | 51 ++++++++++++++++++++++++++++++++++++++++++ > src/libvirt_public.syms | 5 ++++ > 5 files changed, 69 insertions(+), 0 deletions(-) > + > +/** > + * virDomainGetPcpusUsage: > + * @dom: pointer to domain object > + * @usages: returned physical cpu usages > + * @nr_usages: length of @usages > + * @flags: flags to control the operation > + * > + * Get the cpu usages for every physical cpu since the domain started (in nanoseconds). > + * > + * Returns 0 if success, -1 on error > + */ > +int virDomainGetPcpusUsage(virDomainPtr dom, > + unsigned long long *usages, > + int *nr_usages, > + unsigned int flags) Overall, I think this patch series is headed in the right direction. And the idea of using the cpuacct cgroup for per-cpu usage numbers is a cool hack. However, I think this API is a bit too restricted, and that a few tweaks can make the API more powerful and avoid the need for adding new API in the future (instead, just adding new named parameter types to this API). First, we don't use the term Pcpu anywhere else in libvirt. I'm thinking it would be better to name this virDomainGetCPUStats, to match our existing virNodeGetCPUStats. (Too bad that we weren't consistent between CPU vs. Vcpu in capitalization.) Also, by naming it just CPUStats, rather than PcpusUsate, I think we can open the door to further stats - for example, I can envision that we might want to get stats not only on how much cpu accounting has been attributed to the host qemu process, but we can also communicate to a guest agent and get the guest's own view of its cpu usage; the difference between the two values would thus be the overhead consumed by the hypervisor in presenting an environment to the guest. Of course, we don't have to integrate with a guest agent now, but we should be designing the API with that in mind. Next, I notice that right now, cpuacct provides the following properties: cpuacct.stat (gives elapsed user and system times for the cgroup, similar to 2 of the 3 cumulative accounting values given by the time(1) utility; looks like the unit is roughly in microseconds), cpuacct.usage_percpu (gives just nanosecond usage, without division between user or system), and cpuacct.usage (the sum of cpuacct.usage). And it is conceivable that future kernels will add even more stats under the cpuacct umbrella. But your patch only uses cpuacct.usage_percpu. Why should we be hard-coding things to just one stat? Additionally, I'm comparing this with existing APIs, for things we have learned in the past about querying statistics. virNodeGetCPUStats is rather limited in that it can only return the stats of one processor at a time; but has the benefit that the magic VIR_NODE_CPU_STATS_ALL_CPUS (aka -1) can provide the sum across all processors into that one stat. So a machine with 8 processors requires 8 calls to the API, but you can also get the summary in one blow without doing any additions yourself. It may also be the case that some stats are available only on the entire process, rather than per-cpu. virNodeGetCPUStats also has the drawback that it can only return unsigned long long values, rather than using virTypedParameter; but at least it has the benefit that it can return an arbitrary number of name/value stat tuples. Meanwhile, virNodeGetCellsFreeMemory allows the user to request a start and stop limit, and thus grab multiple stats in one call, but doesn't allow passing -1 to get a summary (that is, with 8 NUMA cells you can make just 1 call, but have to do the addition yourself). It has a drawback that it can only return one specific stat; it cannot be extended to other statistics. And virDomainGetVcpuPinInfo is an example of a function that returns a 2-dimensional array through a single output pointer, by passing in two separate length parameters; but again with a limited output of only one specific stat. Borrowing from these ideas, I think your API should return an array of virTypedParameter per cpu, so that we can add new stats without needing new API. It should also allow the ability to return a summary over all cpus, instead of a 2-dimensional list of statistics per cpu. I'm thinking something like the following: /** * virDomainGetCPUStats: * @dom: domain to query * @params: array to populate on output * @nparams: number of parameters per cpu * @start_cpu: which cpu to start with, or -1 for summary * @ncpus: how many cpus to query * @flags: unused for now * * Get statistics relating to CPU usage attributable to a single * domain (in contrast to the statistics returned by * virNodeGetCPUStats() for all processes on the host). @dom * must be running (an inactive domain has no attributable cpu * usage). On input, @params must contain at least @nparams * @ncpus * entries, allocated by the caller. * * If @start_cpu is -1, then @ncpus must be 1, and the returned * results reflect the statistics attributable to the entire * domain (such as user and system time for the process as a * whole). Otherwise, @start_cpu represents which cpu to start * with, and @ncpus represents how many consecutive processors to * query, with statistics attributable per processor (such as * per-cpu usage). * * As a special case, if @params is NULL and @nparams is 0, then * @ncpus must be 1, and the return value will be how many * statistics are available for the given @start_cpu. This number * may be different for @start_cpu of -1 than for any non-negative * value, but will be the same for all non-negative @start_cpu. * The combination of @start_cpu and @ncpus must not exceed the * number of available cpus to query. * * For now, @flags is unused, and the statistics all relate to the * usage from the host perspective; the number of cpus available to * query can be determined by the cpus member of the result of * virNodeGetInfo(), even if the domain has had vcpus pinned to only * use a subset of overall host cpus. It is possible that a future * version will support a flag that queries the cpu usage from the * guest's perspective, using the number of vcpus available to the * guest, found by virDomainGetVcpusFlags(). An individual guest * vcpu cannot be reliably mapped back to a specific host cpu unless * a single-processor vcpu pinning was used, but when @start_cpu is -1, * any difference in usage between a host and guest perspective would * serve as a measure of hypervisor overhead. * * Returns -1 on failure, or the number of statistics that were * populated per cpu on success (this will be less than the total * number of populated @params, unless @ncpus was 1; and may be * less than @nparams). The populated parameters start at each * stride of @nparams; any unpopulated parameters will be zeroed * on success. The caller is responsible for freeing any returned * string parameters. */ int virDomainGetCPUStats(virDomainPtr dom, virTypedParamterPtr params, int nparams, int start_cpu, int end_cpu, unsigned int flags); where you might have the following behavior on a 2-cpu machine where the guest has 1 vcpu allocated: virDomainGetCPUStats(dom, NULL, 0, -1, 1, 0) => 2 virDomainGetCPUStats(dom, params, 2, -1, 1, 0) => 2, with params[0] being <"user",ull,6507945> and params[1] being <"system",ull,1817278> - that is, the times found in cpuacct.stat virDomainGetCPUStats(dom, NULL, 0, 0, 1, 0) => 1 virDomainGetCPUStats(dom, params, 1, 0, 3, 0) => -1 (end cpu too large) virDomainGetCPUStats(dom, params, 2, 0, 2, 0) => 1, with params[0] being <"usage",ull,45211932635804>, params[1] being zeroed, params[2] being <"usage",ull,44711067947567>, and params[3] being zeroed and in the future, if we add guest agent interaction via the flag, virDomainGetCPUStats(dom, NULL, 0, 0, 1, GUEST) => 1 virDomainGetCPUStats(dom, params, 1, 0, 2, GUEST) => -1 (end cpu too large) virDomainGetCPUStats(dom, params, 1, 0, 1, GUEST) => 1, with params[0] being <"guestusage",ull,35211932635804> from a query of the guest agent One particular implementation point will be that the RPC call should not transmit zeroed entries. That is, in my example where I passed in an over-allocated params, the rpc call would compress things so that the on-the-wire format should transmit just return_value*ncpus blobs, and deserialization of the wire format would then expand them back into the strides of the user's params array. Another point is that you must provide reasonable bounds for both nparams (16 is probably okay, compared to our other *_PARAMETERS_MAX values in remote_protocol.x) and ncpus. Others in this thread suggested a minimum bound of 4096 for ncpus, but that adds up fast when you have 92-byte virTypedParameters (the on-the-wire format transmits a nul-terminated string rather than all 80 bytes of any virTypedParameter.name field, but that still means on-the-wire can easily be sending 32 bytes per parameter). I'd suggest 128 as the cap for ncpus (16 * 32 * 128 is 64k, which is approaching the rpc bounds we have for a single string anyway), as well as documentation that a user with more than 128 cpus would have to break things into chunks to avoid exceeding RPC limitations. Putting limits on both nparams and ncpus also avoids any potential issues with multiplication overflow causing what looks like a small product after wraparound. If others like my thoughts, can you respin your patch series to adjust your API accordingly? -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list