On Thu, May 16, 2013 at 1:18 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: > On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote: >> > How many CPU cores are you testing on ? That's a good improvement, >> > but I'd expect the improvement to be greater as # of core is larger. >> >> I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out >> software bottlenecks, I'm intentionally running fewer tasks (20 parallel >> creations) than the number of logical cores (24). The memory, disk and >> network are also well over provisioned. >> >> > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we >> > limit a single connection to only 5 RPC calls. Beyond that calls >> > queue up, even if libvirtd is otherwise idle. OpenStack uses a >> > single connection for everythin so will hit this. I suspect this >> > would be why virConnectGetLibVersion would appear to be slow. That >> > API does absolutely nothing of any consequence, so the only reason >> > I'd expect that to be slow is if you're hitting a libvirtd RPC >> > limit causing the API to be queued up. >> >> I hadn't tuned libvirtd.conf at all. I have just increased >> max_{clients,workers,requests,client_requests} to 50 and repeated my >> experiment. As you expected, virtConnectGetLibVersion is now very fast. >> Unfortunately, the median VM creation time didn't change. >> >> > I'm not actively doing anything in this area. Mostly because I've got not >> > clear data on where any remaining bottlenecks are. >> >> Unless there are other parameters to tweak, I believe I'm still hitting a >> bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt >> calls are >> >> virConnectDefineXML*: 13ms vs 4.5s >> virDomainCreateWithFlags*: 1.8s vs 20s >> >> * I had said that virConnectDefineXML wasn't serialized in my first email. I >> based that observation on a single trace I looked at :-) In the average case, >> virConnectDefineXML is affected by a bottleneck. > > virConnectDefineXML would at least hit the possible bottleneck on > the virDomainObjListAddLocked method. In fact that's pretty much > the only contended lock I'd expect it to hit. Nothing else that > it runs has any serious locking involved. Okay cool, I'll measure this. I'll also try to figure out what virDomainCreateWithFlags is waiting on. >> Note that when I took these measurements, I also monitored CPU & disk >> utilization. >> During the 20 VM test, both CPU & disk were well below 100% for 97% of the test >> (i.e., 60s test duration, measured utilization with atop using a 2 >> second interval, >> CPU was pegged for 2s). >> >> > One theory I had was that the virDomainObjListSearchName method could >> > be a bottleneck, becaue that acquires a lock on every single VM. This >> > is invoked when starting a VM, when we call virDomainObjListAddLocked. >> > I tried removing this locking though & didn't see any performance >> > benefit, so never persued this further. Before trying things like >> > this again, I think we'd need to find a way to actually identify where >> > the true bottlenecks are, rather than guesswork. >> >> Testing your hypothesis would be straightforward. I'll add some >> instrumentation to >> measure the time spent waiting for the locks and repeat my 20 VM experiment. Or, >> if there's some systematic lock profiling in place, then I can turn >> that on and report >> the results. > > There's no lock profiling support built-in to libvirt. I'm not sure > of the best way introduce such support without it impacting the very > thing we're trying to test. Suggestions welcome A straightforward way to keep lock statistics with low overhead and w/out affecting concurrency would be to use thread local storage (TLS). At the end of a run, or periodically, the stats could be aggregated and reported. Since the stats don't have to be precise, it's OK to do the aggregation racily. Simple statistics to keep are * For each lock L, the time spent waiting. * For each lock L and callsite C, the time spent waiting. It would probably be sufficient to identify L as the lock's parent class name. If per-instance stats are necessary, then we could add the address of the object to the identity of L. So pseudo code would look something like this: struct lock_stats { map of (lock_class) to unsigned long: wait_time; map of (lock_class, stack_trace) to unsigned long: callsite_wait_time; }; __thread struct lock_stats * lock_stats; thread_local_storage_init() { lock_stats = new lock_stats; } /* return microseconds elapsed since some arbitrary start time */ unsigned long timestamp(void) { struct timespec timespec; clock_gettime(CLOCK_MONOTONIC, ×pec); return timespec.tv_sec * 1e6 + timespec.tv_sec / 10; } void virObjectLock(void *anyobj) { unsigned long start, elapsed; virObjectLockablePtr obj = anyobj; start = timestamp(); virMutexLock(&obj->lock); elapsed = timestamp() - start; lock_stats->wait_time[obj->parent.klass->name] += elapsed lock_stats->wait_time[obj->parent.klass->name, get_stack_trace()] += elapsed } > Perhaps a systemtap script would do a reasonable job at it though. > eg record any stack traces associated with long futex_wait() system > calls or something like that. > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list