Re: [PATCH 2/3] Qemu: add CMT support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: libvir-list-bounces@xxxxxxxxxx [mailto:libvir-list-bounces@xxxxxxxxxx]
> On Behalf Of Ren, Qiaowei
> Sent: Monday, August 10, 2015 9:06 AM
> To: 'Daniel P. Berrange'
> Cc: 'libvir-list@xxxxxxxxxx'
> Subject: Re:  [PATCH 2/3] Qemu: add CMT support
> 
> 
> > -----Original Message-----
> > From: Ren, Qiaowei
> > Sent: Tuesday, July 21, 2015 4:00 PM
> > To: Daniel P. Berrange
> > Cc: libvir-list@xxxxxxxxxx
> > Subject: RE:  [PATCH 2/3] Qemu: add CMT support
> >
> > On Jul 20, 2015 22:34, Daniel P. Berrange wrote:
> > > On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote:
> > >>
> > >> Daniel P. Berrange wrote on Jul 20, 2015 17:32:
> > >>> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote:
> > >>>> One RFC in
> > >>>> https://www.redhat.com/archives/libvir-list/2015-June/msg01509.ht
> > >>>> m
> > >>>> l
> > >>>>
> > >>>> CMT (Cache Monitoring Technology) can be used to measure the
> > >>>> usage of cache by VM running on the host. This patch will extend
> > >>>> the bulk stats API (virDomainListGetStats) to add this field.
> > >>>> Applications based on libvirt can use this API to achieve cache
> > >>>> usage of VM. Because CMT implementation in Linux kernel is based
> > >>>> on perf mechanism, this patch will enable perf event for CMT when
> > >>>> VM is created and disable it when VM is destroyed.
> > >>>>
> > >>>> Signed-off-by: Qiaowei Ren <qiaowei.ren@xxxxxxxxx>
> > >>>>
> > >>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > >>>> index
> > >>>> 4cfae03..8c678c9 100644 --- a/src/qemu/qemu_driver.c +++
> > >>>> b/src/qemu/qemu_driver.c @@ -19320,6 +19320,53 @@
> > >>>> qemuDomainGetStatsBlock(virQEMUDriverPtr driver,
> > >>>>
> > >>>>  #undef QEMU_ADD_COUNT_PARAM
> > >>>> +static int +qemuDomainGetStatsCache(virQEMUDriverPtr driver
> > >>>> ATTRIBUTE_UNUSED, +                        virDomainObjPtr dom, +
> > >>>>                    virDomainStatsRecordPtr record, +
> > >>>>       int *maxparams, +                        unsigned int privflags
> > >>>> ATTRIBUTE_UNUSED)
> > >>>
> > >>> So this is a method that is used to collect per-domain information
> > >>>
> > >>>> +{
> > >>>> +    qemuDomainObjPrivatePtr priv = dom->privateData;
> > >>>> +    FILE *fd;
> > >>>> +    unsigned long long cache = 0;
> > >>>> +    int scaling_factor = 0;
> > >>>> +
> > >>>> +    if (priv->cmt_fd <= 0)
> > >>>> +        return -1;
> > >>>> +
> > >>>> +    if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) {
> > >>>> +        virReportSystemError(errno, "%s",
> > >>>> +                             _("Unable to read cache data"));
> > >>>> +        return -1;
> > >>>> +    }
> > >>>> +
> > >>>> +    fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", "r");
> > >>>> +    if (!fd) {
> > >>>> +        virReportSystemError(errno, "%s",
> > >>>> +                             _("Unable to open CMT scale file"));
> > >>>> +        return -1;
> > >>>> +    }
> > >>>> +    if (fscanf(fd, "%d", &scaling_factor) != 1) {
> > >>>> +        virReportSystemError(errno, "%s",
> > >>>> +                             _("Unable to read CMT scale file"));
> > >>>> +        VIR_FORCE_FCLOSE(fd);
> > >>>> +        return -1;
> > >>>> +    }
> > >>>> +    VIR_FORCE_FCLOSE(fd);
> > >>>
> > >>> But this data you are reading is global to the entire host.
> > >>>
> > >>
> > >> In fact this data is only for per-domain.  When the perf syscall is
> > >> called to enable perf event for domain, the pid of that domain is
> > >> passed.
> > >
> > > Ah, I see - you rely on the open file descriptor to be associated with the VM
> pid.
> > >
> > >
> > >>>> -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver,
> > >>>>      virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort);
> > >>>>      priv->nbdPort = 0;
> > >>>> +    /* Disable CMT */
> > >>>> +    if (priv->cmt_fd > 0) {
> > >>>
> > >>> You can't rely on keeping an open file descriptor for the guest
> > >>> because libvirtd may be restarted.
> > >>>
> > >>
> > >> Sorry, I don't really get the meaning of this. You mean that when
> > >> libvirtd is restarted, those resource which the domain opened
> > >> should be closed, right?
> > >
> > > No, when libvirtd is restarted, the domains must all continuing
> > > running without loss of state. You open the FD when starting the
> > > guest, then libvirtd is restarted, now someone wants to query the
> > > perf data. The perf FD will not be open anymore because libvirtd was
> > > restarted. At least you'd need to re-open the file descriptor when
> > > libvirtd starts up again, for any running guest. I'm not really
> > > convinced we want to keep the perf file descriptors open for all the
> > > domains for the entire time they are running. Should really only
> > > open them
> > when we actually want to read the collected data.
> > >
> >
> > Got it! Should open/disable them when read the data.
> >
> > >>
> > >>>> +        if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) {
> > >>>> +            virReportSystemError(errno, "%s",
> > >>>> +                                 _("Unable to disable perf event for CMT"));
> > >>>> +        }
> > >>>> +        VIR_FORCE_CLOSE(priv->cmt_fd);
> > >>>> +    }
> > >>>> +
> > >>>>      if (priv->agent) {
> > >>>>          qemuAgentClose(priv->agent);
> > >>>>          priv->agent = NULL;
> > >>>
> > >>>
> > >>> Conceptually I think this approach to implementation is flawed.
> > >>> While you are turning on/off the perf events for each QEMU
> > >>> process, the data collection does not distinguish data from each
> > >>> QEMU process
> > >>> - the data reported is host wide. So this really doesn't make much
> > >>> sense
> > IMHO.
> > >>>
> > >>
> > >> As mentioned above, the data reported is only for domain.
> > >>
> > >>> I'm also wondering whether this is really going to be sufficiently
> > >>> useful on its own. CPUs have countless other performance counters
> > >>> that I would imagine apps/admins will want to read in order to
> > >>> analyse QEMU performance, beyond this new CMT feature. The domain
> > >>> stats API won't really scale up to dealing with arbitrary perf
> > >>> event counter reporting so I'm not much of a fan of just special
> > >>> casing CMT in this
> > way.
> > >>>
> > >>> IOW, if we want to support host performance analysis in libvirt,
> > >>> then we probably want to design an set of APIs specifically for
> > >>> this purpose, but I could well see us saying that this is out of
> > >>> scope for libvirt and apps shoud just use the linux perf interfaces directly.
> > >>>
> > >>
> > >> Yes. I can get what you mean. Maybe libvirt doesn't have to be
> > >> responsible for supporting host performance.
> > >>
> > >> But I guess cache usage should be important for each domain, if
> > >> those apps based on libvirt can achieve this information they will
> > >> be able to better check and confirm the domain works normally, like
> > >> the stats of cpu/memory/block/... which have be supported in
> > >> libvirt now. Do you think so?
> > >
> > > I'm not saying cache usage is unimportant. There are quite a lot of
> > > other hardware event counters in modern CPUs though, so I'm asking
> > > why we should add just this new special intel event, and not any of
> > > the other existing performance counters that are useful in
> > > diagnosing
> > performance issues.
> >
> > Ah, I guess you mean that the best way is providing a set of utility
> > method for perf operation, like cgroup support. Then we can use these
> > interface to support a lot of necessary perf counters.
> >
> > If we have to do so, maybe I can try to firstly implement such methods
> > and then you can help review them.
> >
> > > Also, I'm thinking that QEMU has many different threads - VCPU
> > > threads, I/O threads, emulator threads and so on. I could see users
> > > want to have distinct profiling for these different functional areas
> > > of QEMU too instead of just whole- QEMU granularity.
> > >
> >
> > Yes. I believe such features should be useful for some users. I am
> > currently working on adding some features like CMT into OpenStack, I
> > only know profiling for VCPU/ IO / emulator threads should be not
> > necessary for OpenStack. ^-^
> >
> 
> Hi Daniel,
> 
> what do you think about it now? If we have to add CMT support in Nova, do you
> think what is the best way? Can we directly use linux perf interface in OpenStack?
> 

Hi Daniel,

If we firstly add utility methods for perf operation into libvirt like cgroup, and then implement CMT feature based on this, do you think it is OK?

Thanks,
Qiaowei

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]