On Jul 20, 2015 22:34, Daniel P. Berrange wrote: > On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote: >> >> Daniel P. Berrange wrote on Jul 20, 2015 17:32: >>> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote: >>>> One RFC in >>>> https://www.redhat.com/archives/libvir-list/2015-June/msg01509.htm >>>> l >>>> >>>> CMT (Cache Monitoring Technology) can be used to measure the usage >>>> of cache by VM running on the host. This patch will extend the >>>> bulk stats API (virDomainListGetStats) to add this field. >>>> Applications based on libvirt can use this API to achieve cache >>>> usage of VM. Because CMT implementation in Linux kernel is based >>>> on perf mechanism, this patch will enable perf event for CMT when >>>> VM is created and disable it when VM is destroyed. >>>> >>>> Signed-off-by: Qiaowei Ren <qiaowei.ren@xxxxxxxxx> >>>> >>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index >>>> 4cfae03..8c678c9 100644 --- a/src/qemu/qemu_driver.c +++ >>>> b/src/qemu/qemu_driver.c @@ -19320,6 +19320,53 @@ >>>> qemuDomainGetStatsBlock(virQEMUDriverPtr driver, >>>> >>>> #undef QEMU_ADD_COUNT_PARAM >>>> +static int +qemuDomainGetStatsCache(virQEMUDriverPtr driver >>>> ATTRIBUTE_UNUSED, + virDomainObjPtr dom, + >>>> virDomainStatsRecordPtr record, + >>>> int *maxparams, + unsigned int privflags >>>> ATTRIBUTE_UNUSED) >>> >>> So this is a method that is used to collect per-domain information >>> >>>> +{ >>>> + qemuDomainObjPrivatePtr priv = dom->privateData; >>>> + FILE *fd; >>>> + unsigned long long cache = 0; >>>> + int scaling_factor = 0; >>>> + >>>> + if (priv->cmt_fd <= 0) >>>> + return -1; >>>> + >>>> + if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) { >>>> + virReportSystemError(errno, "%s", >>>> + _("Unable to read cache data")); >>>> + return -1; >>>> + } >>>> + >>>> + fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", "r"); >>>> + if (!fd) { >>>> + virReportSystemError(errno, "%s", >>>> + _("Unable to open CMT scale file")); >>>> + return -1; >>>> + } >>>> + if (fscanf(fd, "%d", &scaling_factor) != 1) { >>>> + virReportSystemError(errno, "%s", >>>> + _("Unable to read CMT scale file")); >>>> + VIR_FORCE_FCLOSE(fd); >>>> + return -1; >>>> + } >>>> + VIR_FORCE_FCLOSE(fd); >>> >>> But this data you are reading is global to the entire host. >>> >> >> In fact this data is only for per-domain. When the perf syscall is >> called to enable perf event for domain, the pid of that domain is >> passed. > > Ah, I see - you rely on the open file descriptor to be associated with the VM pid. > > >>>> -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver, >>>> virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort); >>>> priv->nbdPort = 0; >>>> + /* Disable CMT */ >>>> + if (priv->cmt_fd > 0) { >>> >>> You can't rely on keeping an open file descriptor for the guest >>> because libvirtd may be restarted. >>> >> >> Sorry, I don't really get the meaning of this. You mean that when >> libvirtd is restarted, those resource which the domain opened should >> be closed, right? > > No, when libvirtd is restarted, the domains must all continuing running without > loss of state. You open the FD when starting the guest, then libvirtd is restarted, > now someone wants to query the perf data. The perf FD will not be open > anymore because libvirtd was restarted. At least you'd need to re-open the file > descriptor when libvirtd starts up again, for any running guest. I'm not really > convinced we want to keep the perf file descriptors open for all the domains for > the entire time they are running. Should really only open them when we actually > want to read the collected data. > Got it! Should open/disable them when read the data. >> >>>> + if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) { >>>> + virReportSystemError(errno, "%s", >>>> + _("Unable to disable perf event for CMT")); >>>> + } >>>> + VIR_FORCE_CLOSE(priv->cmt_fd); >>>> + } >>>> + >>>> if (priv->agent) { >>>> qemuAgentClose(priv->agent); >>>> priv->agent = NULL; >>> >>> >>> Conceptually I think this approach to implementation is flawed. While >>> you are turning on/off the perf events for each QEMU process, the data >>> collection does not distinguish data from each QEMU process - the data >>> reported is host wide. So this really doesn't make much sense IMHO. >>> >> >> As mentioned above, the data reported is only for domain. >> >>> I'm also wondering whether this is really going to be sufficiently >>> useful on its own. CPUs have countless other performance counters that >>> I would imagine apps/admins will want to read in order to analyse QEMU >>> performance, beyond this new CMT feature. The domain stats API won't >>> really scale up to dealing with arbitrary perf event counter reporting >>> so I'm not much of a fan of just special casing CMT in this way. >>> >>> IOW, if we want to support host performance analysis in libvirt, >>> then we probably want to design an set of APIs specifically for this >>> purpose, but I could well see us saying that this is out of scope >>> for libvirt and apps shoud just use the linux perf interfaces directly. >>> >> >> Yes. I can get what you mean. Maybe libvirt doesn't have to be >> responsible for supporting host performance. >> >> But I guess cache usage should be important for each domain, if those >> apps based on libvirt can achieve this information they will be able >> to better check and confirm the domain works normally, like the stats >> of cpu/memory/block/... which have be supported in libvirt now. Do you >> think so? > > I'm not saying cache usage is unimportant. There are quite a lot of other > hardware event counters in modern CPUs though, so I'm asking why we should > add just this new special intel event, and not any of the other existing > performance counters that are useful in diagnosing performance issues. Ah, I guess you mean that the best way is providing a set of utility method for perf operation, like cgroup support. Then we can use these interface to support a lot of necessary perf counters. If we have to do so, maybe I can try to firstly implement such methods and then you can help review them. > Also, I'm thinking that QEMU has many different threads - VCPU threads, I/O > threads, emulator threads and so on. I could see users want to have distinct > profiling for these different functional areas of QEMU too instead of just whole- > QEMU granularity. > Yes. I believe such features should be useful for some users. I am currently working on adding some features like CMT into OpenStack, I only know profiling for VCPU/ IO / emulator threads should be not necessary for OpenStack. ^-^ Thanks, Qiaowei -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list