Re: [PATCH 2/3] Qemu: add CMT support

"Ren, Qiaowei" <qiaowei.ren@xxxxxxxxx> · Tue, 21 Jul 2015 07:59:40 +0000

On Jul 20, 2015 22:34, Daniel P. Berrange wrote:
> On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote:
>> 
>> Daniel P. Berrange wrote on Jul 20, 2015 17:32:
>>> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote:
>>>> One RFC in
>>>> https://www.redhat.com/archives/libvir-list/2015-June/msg01509.htm
>>>> l
>>>> 
>>>> CMT (Cache Monitoring Technology) can be used to measure the usage
>>>> of cache by VM running on the host. This patch will extend the
>>>> bulk stats API (virDomainListGetStats) to add this field.
>>>> Applications based on libvirt can use this API to achieve cache
>>>> usage of VM. Because CMT implementation in Linux kernel is based
>>>> on perf mechanism, this patch will enable perf event for CMT when
>>>> VM is created and disable it when VM is destroyed.
>>>> 
>>>> Signed-off-by: Qiaowei Ren <qiaowei.ren@xxxxxxxxx>
>>>> 
>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index
>>>> 4cfae03..8c678c9 100644 --- a/src/qemu/qemu_driver.c +++
>>>> b/src/qemu/qemu_driver.c @@ -19320,6 +19320,53 @@
>>>> qemuDomainGetStatsBlock(virQEMUDriverPtr driver,
>>>> 
>>>>  #undef QEMU_ADD_COUNT_PARAM
>>>> +static int +qemuDomainGetStatsCache(virQEMUDriverPtr driver
>>>> ATTRIBUTE_UNUSED, +                        virDomainObjPtr dom, +    
>>>>                    virDomainStatsRecordPtr record, +                 
>>>>       int *maxparams, +                        unsigned int privflags
>>>> ATTRIBUTE_UNUSED)
>>> 
>>> So this is a method that is used to collect per-domain information
>>> 
>>>> +{
>>>> +    qemuDomainObjPrivatePtr priv = dom->privateData;
>>>> +    FILE *fd;
>>>> +    unsigned long long cache = 0;
>>>> +    int scaling_factor = 0;
>>>> +
>>>> +    if (priv->cmt_fd <= 0)
>>>> +        return -1;
>>>> +
>>>> +    if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) {
>>>> +        virReportSystemError(errno, "%s",
>>>> +                             _("Unable to read cache data"));
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", "r");
>>>> +    if (!fd) {
>>>> +        virReportSystemError(errno, "%s",
>>>> +                             _("Unable to open CMT scale file"));
>>>> +        return -1;
>>>> +    }
>>>> +    if (fscanf(fd, "%d", &scaling_factor) != 1) {
>>>> +        virReportSystemError(errno, "%s",
>>>> +                             _("Unable to read CMT scale file"));
>>>> +        VIR_FORCE_FCLOSE(fd);
>>>> +        return -1;
>>>> +    }
>>>> +    VIR_FORCE_FCLOSE(fd);
>>> 
>>> But this data you are reading is global to the entire host.
>>> 
>> 
>> In fact this data is only for per-domain.  When the perf syscall is
>> called to enable perf event for domain, the pid of that domain is
>> passed.
> 
> Ah, I see - you rely on the open file descriptor to be associated with the VM pid.
> 
> 
>>>> -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>>>>      virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort);
>>>>      priv->nbdPort = 0;
>>>> +    /* Disable CMT */
>>>> +    if (priv->cmt_fd > 0) {
>>> 
>>> You can't rely on keeping an open file descriptor for the guest
>>> because libvirtd may be restarted.
>>> 
>> 
>> Sorry, I don't really get the meaning of this. You mean that when
>> libvirtd is restarted, those resource which the domain opened should
>> be closed, right?
> 
> No, when libvirtd is restarted, the domains must all continuing running without
> loss of state. You open the FD when starting the guest, then libvirtd is restarted,
> now someone wants to query the perf data. The perf FD will not be open
> anymore because libvirtd was restarted. At least you'd need to re-open the file
> descriptor when libvirtd starts up again, for any running guest. I'm not really
> convinced we want to keep the perf file descriptors open for all the domains for
> the entire time they are running. Should really only open them when we actually
> want to read the collected data.
> 

Got it! Should open/disable them when read the data.

>> 
>>>> +        if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) {
>>>> +            virReportSystemError(errno, "%s",
>>>> +                                 _("Unable to disable perf event for CMT"));
>>>> +        }
>>>> +        VIR_FORCE_CLOSE(priv->cmt_fd);
>>>> +    }
>>>> +
>>>>      if (priv->agent) {
>>>>          qemuAgentClose(priv->agent);
>>>>          priv->agent = NULL;
>>> 
>>> 
>>> Conceptually I think this approach to implementation is flawed. While
>>> you are turning on/off the perf events for each QEMU process, the data
>>> collection does not distinguish data from each QEMU process - the data
>>> reported is host wide. So this really doesn't make much sense IMHO.
>>> 
>> 
>> As mentioned above, the data reported is only for domain.
>> 
>>> I'm also wondering whether this is really going to be sufficiently
>>> useful on its own. CPUs have countless other performance counters that
>>> I would imagine apps/admins will want to read in order to analyse QEMU
>>> performance, beyond this new CMT feature. The domain stats API won't
>>> really scale up to dealing with arbitrary perf event counter reporting
>>> so I'm not much of a fan of just special casing CMT in this way.
>>> 
>>> IOW, if we want to support host performance analysis in libvirt,
>>> then we probably want to design an set of APIs specifically for this
>>> purpose, but I could well see us saying that this is out of scope
>>> for libvirt and apps shoud just use the linux perf interfaces directly.
>>> 
>> 
>> Yes. I can get what you mean. Maybe libvirt doesn't have to be
>> responsible for supporting host performance.
>> 
>> But I guess cache usage should be important for each domain, if those
>> apps based on libvirt can achieve this information they will be able
>> to better check and confirm the domain works normally, like the stats
>> of cpu/memory/block/... which have be supported in libvirt now. Do you
>> think so?
> 
> I'm not saying cache usage is unimportant. There are quite a lot of other
> hardware event counters in modern CPUs though, so I'm asking why we should
> add just this new special intel event, and not any of the other existing
> performance counters that are useful in diagnosing performance issues.

Ah, I guess you mean that the best way is providing a set of utility method for perf operation, like cgroup support. Then we can use these interface to support a lot of necessary perf counters.

If we have to do so, maybe I can try to firstly implement such methods and then you can help review them.

> Also, I'm thinking that QEMU has many different threads - VCPU threads, I/O
> threads, emulator threads and so on. I could see users want to have distinct
> profiling for these different functional areas of QEMU too instead of just whole-
> QEMU granularity.
> 

Yes. I believe such features should be useful for some users. I am currently working on adding some features like CMT into OpenStack, I only know profiling for VCPU/ IO / emulator threads should be not necessary for OpenStack. ^-^

Thanks,
Qiaowei

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list