Re: [RFC][scale] new API for querying domains stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.07.2014 11:33, Daniel P. Berrange wrote:
On Tue, Jul 01, 2014 at 11:19:04AM +0200, Michal Privoznik wrote:
On 01.07.2014 09:09, Francesco Romani wrote:
Hi everyone,

I'd like to discuss possible APIs and plans for new query APIs in libvirt.

I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM;
VDSM is the node management daemon, which is in charge, among many other things, to
gather the host and statistics per Domain/VM.

Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans
to scale much more, and to possibly reach thousands in a not so distant future.
At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk),
and of course this obviously scales poorly.

I think this is your main problem. Why not have only one thread that would
manage list of domains to query and issue the APIs periodically instead of
having one thread per domain?

You suffer from round trip time on every API call if you serialize it all
in a single thread. eg if every API call is 50ms and you want to check
once per scond, you can only monitor  20 VMs before you take more time than
you have available. This really sucks when the majority of that 50ms is a
sleep in poll() waiting for the RPC response.

Unless you have the bulk query API which will take the RTT only once ;)


This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously
python 2.x behaves very badly with threads. We are already working to improve our code,
but I'd like to bring the discussion here and see if and when the querying API can be improved.

We currently use these APIs for our sempling:
   virDomainBlockInfo
   virDomainGetInfo
   virDomainGetCPUStats
   virDomainBlockStats
   virDomainBlockStatsFlags
   virDomainInterfaceStats
   virDomainGetVcpusFlags
   virDomainGetMetadata

What we'd like to have is

* asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
   This would be just awesome. Either a single callback or a different one per call is fine
   (let's discuss this!).
   please note that we are much more concerned about thread reduction then about performance
   numbers. We had report of thread number becoming a real harm, while performance so far
   is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)

I'm not a big fan of this approach. I mean, IIRC python has this Big Python
Lock, which effectively prevents two threads run concurrently. So while in C
this would make perfect sense, it doesn't do so in python. The callbacks
would be called from the event loop, which given how frequently you dump the
info will block other threads. Therefore I'm afraid the approach would not
bring any speed up, rather slow down.

I'm not sure I agree with your assessment here. If we consider a single
API call, the time this takes to complete is made up of a number of parts

  1. Time to write() the RPC call to the socket
  2. Time for libvirtd to process the RPC call
  3. Time to recv() the RPC reply from the socket

  1. Time to write() the RPC call to the socket
  2. Time for libvirtd to process the RPC call
  3. Time to recv() the RPC reply from the socket

  1. Time to write() the RPC call to the socket
  2. Time for libvirtd to process the RPC call
  3. Time to recv() the RPC reply from the socket
  ...and so on..

If the time for item 2 dominates over the time for items 1 & 2 (which
it should really) then the client thread is going to be sleeping in a
poll() for the bulk of the duration of the libvirt API call. If we had
an async API mechanism, then the VDSM time would essentially be consumed
with

  1. Time to write() the RPC call to the socket
  2. Time to write() the RPC call to the socket
  3. Time to write() the RPC call to the socket
  4. Time to write() the RPC call to the socket
  5. Time to write() the RPC call to the socket
  6. Time to write() the RPC call to the socket
  7. wait for replies to start arriving
  8. Time to recv() the RPC reply from the socket
  9. Time to recv() the RPC reply from the socket
  10. Time to recv() the RPC reply from the socket
  11. Time to recv() the RPC reply from the socket
  12. Time to recv() the RPC reply from the socket
  13. Time to recv() the RPC reply from the socket
  14. Time to recv() the RPC reply from the socket


Well, in the async form you need to account even the time spent in the callbacks:

1. write(serial=1, ...)
2. write(serial=2, ...)
..
7. wait for replies
8. recv(serial=x1, ...)   // there's no guarantee on order of replies
9. callback(serial=x1, ...)
10. recv(serial=x2, ...)
11. callback(serial=x2, ....)

And it's the callback times I'm worried about. I'm not saying we should not add the callback APIs. What I'm really saying is I have doubts it will help python apps. It will definitely help scaling C applications though.

Of course there's a limit to how many outstanding async calls you can
make before the event loop gets 100% busy processing the responses,
but I don't think that makes async calls worthless. Even if we had the
bulk list API calls, async calling would be useful, because it would
let VDSM fire off requests for disk, net, cpu, mem stats in parallel
from a single thread.

Regards,
Daniel


Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]