On Fri, Sep 25, 2015 at 01:48:41PM -0400, Laine Stump wrote: > On 09/25/2015 01:27 PM, Daniel P. Berrange wrote: > >On Fri, Sep 25, 2015 at 05:22:30PM +0100, Daniel P. Berrange wrote: > >>On Fri, Sep 25, 2015 at 11:13:52AM -0400, Laine Stump wrote: > >>>There's a bit of background about this here: > >>> > >>>https://www.redhat.com/archives/augeas-devel/2015-September/msg00001.html > >>> > >>>In short, virt-manager is calling the virInterface APIs and that ties > >>>up a libvirt thread (and CPU core) for a very long time on hosts that > >>>have a large number of interfaces. These patches don't cure the > >>>problem (I don't know that there really is a cure other than "Don't DO > >>>that!"), but they do fix a couple of bugs I found while investigating, > >>>and make a substantial improvement in the amount of time used by > >>>virConnectListAllInterfaces(). > >>> > >>>One thing that I wondered about while investigating this - a big use > >>>of CPU by virConnectListAllInterfaces() comes from the need to > >>>retrieve the MAC address of every interface. The MAC addresses are > >>>both > >>> > >>>1) returned to the caller in the interface objects and > >>> > >>>2) sent to the policykit ACL checking to decide which interfaces to include in > >>>the list. > >>> > >>>I'm 90% confident that > >>> > >>>1) most callers don't really care that they're getting the MAC address > >>>along with interface name (virt-manager, for example, follows up with > >>>a virInterfaceGetXMLDesc() anyway)), and > >>> > >>>2) there is not even a single host *anywhere* that is using libvirt > >>>policykit ACLs to limit the list of host interfaces viewable by a > >>>client. > >>> > >>>So we could add a flag to not return MAC addresses, which would allow > >>>cutting down the time to list all devices to something < 1 > >>>second). But this would be at the expense of no longer having the > >>>possibility to limit the list with policykit according to MAC > >>>address. Since all host interface information is available to all > >>>users via the file system, for example, I don't see this as a huge > >>>issue, but it would change behavior, so I don't feel comfortable doing > >>>it. I don't like the idea of a single API call taking > 1 minute to > >>>return either, though. Does anyone have an opinion about this? > >>Ultimately 500 interfaces, each ifcfg-XXX file 300 bytes in size > >>on average is only 150 KB of data. Given the amount of data we > >>are consuming, here I think it is reasonable to expect we can > >>process than in a tiny fraction of a second. So there's clearly > >>a serious algorithmic / data structure flaw here if its taking > >>minutes. > >> > >>By the sounds of the thread you quote, its in augeas itself, so I > >>think we really need to focus on addressing that root cause as a > >>priority rather than try to work around it. > >> > >>As a side note, we might consider adding new API to netcf so that > >>we can fetch the entire interface set + macs in one api call to > >>netcf, though I doubt it'd chance performance that much. > >So, I instrumented the netcf and augeas code to checking timings. > > What did you use? I tried using perf and oprofile, but all I could get them > to tell me was that a ton of time was being spent in strcmp(), so either it > couldn't figure out who was the caller due to missing stack frame pointers, > or I just didn't know the right commandline options. (The last time I did > any serious profiling I used some custom code (written by someone else at a > previous employer) that massaged xml format output from oprofile. A lot has > changed since then.) > > >The aug_get calls time less than a millisecond, as do the various > >other calls. I found the bulk of the time is actually coming from > >the netcf function "get_augeas", which in turns call "aug_load" > >for every single damn netcf function call. > > I remember David Lutterkort talking about exactly that problem several years > ago and *thought* I remembered that he had put something into augeas to only > reread the files if they had changed. Has my memory failed me again? Or is > augeas doing something and netcf just isn't taking advantage of it? It's at least in the releas notes: 0.7.3 - 2010-08-06 aug_load: only reparse files that have actually changed; greatly speeds up reloading Cheers, -- Guido > > >Either we need to stop loading congfig files on every fnuction > >call in netcf, or we need to add a netcf bulk data API call, > >so that libvirt can load /all/ the data it needs in 1 single > >API call. > > I much prefer (1) :-) > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list