Re: [PATCHv7 17/18] qemu: Report cache occupancy (CMT) with domstats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/6/2018 4:04 AM, John Ferlan wrote:
On 10/22/18 4:01 AM, Wang Huaqiang wrote:
Adding the interface in qemu to report CMT statistic information
through command 'virsh domstats --cpu-total'.

Below is a typical output:

         # virsh domstats 1 --cpu-total
         Domain: 'ubuntu16.04-base'
           ...
           cpu.cache.monitor.count=2
           cpu.cache.monitor.0.name=vcpus_1
           cpu.cache.monitor.0.vcpus=1
           cpu.cache.monitor.0.bank.count=2
           cpu.cache.monitor.0.bank.0.id=0
           cpu.cache.monitor.0.bank.0.bytes=4505600
           cpu.cache.monitor.0.bank.1.id=1
           cpu.cache.monitor.0.bank.1.bytes=5586944
           cpu.cache.monitor.1.name=vcpus_4-6
           cpu.cache.monitor.1.vcpus=4,5,6
           cpu.cache.monitor.1.bank.count=2
           cpu.cache.monitor.1.bank.0.id=0
           cpu.cache.monitor.1.bank.0.bytes=17571840
           cpu.cache.monitor.1.bank.1.id=1
           cpu.cache.monitor.1.bank.1.bytes=29106176

Signed-off-by: Wang Huaqiang <huaqiang.wang@xxxxxxxxx>
---
 src/libvirt-domain.c     |   9 ++
 src/libvirt_private.syms |   1 +
 src/qemu/qemu_driver.c   | 229 +++++++++++++++++++++++++++++++++++++++++++++++
 src/util/virresctrl.c    | 130 +++++++++++++++++++++++++++
 src/util/virresctrl.h    |  12 +++
 5 files changed, 381 insertions(+)

I have a feeling I'll be asking for this to be split up a bit, but let's
see...  There's *util, *qemu, and *API code in the same patch.



diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index 7690339..4895f9f 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -11345,6 +11345,15 @@ virConnectGetDomainCapabilities(virConnectPtr conn,
  *     "cpu.user" - user cpu time spent in nanoseconds as unsigned long long.
  *     "cpu.system" - system cpu time spent in nanoseconds as unsigned long
  *                    long.
+ *     "cpu.cache.monitor.count" - tocal cache monitoring groups
+ *     "cpu.cache.monitor.M.name" - name of cache monitoring group 'M'
+ *     "cpu.cache.monitor.M.vcpus" - vcpus for cache monitoring group 'M'
+ *     "cpu.cache.monitor.M.bank.count" - total bank number of cache monitoring
+ *                    group 'M'
+ *     "cpu.cache.monitor.M.bank.N.id" - OS assigned cache bank id for cache
+ *                    'N' in cache monitoring group 'M'
+ *     "cpu.cache.monitor.M.bank.N.bytes" - monitor's cache occupancy of cache
+ *                    bank 'N' in cache monitoring group 'M'
  *
  * VIR_DOMAIN_STATS_BALLOON:
  *     Return memory balloon device information.
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 91801ed..4551767 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -2683,6 +2683,7 @@ virResctrlInfoNew;
 virResctrlMonitorAddPID;
 virResctrlMonitorCreate;
 virResctrlMonitorDeterminePath;
+virResctrlMonitorGetCacheOccupancy;
 virResctrlMonitorGetID;
 virResctrlMonitorIsRunning;
 virResctrlMonitorNew;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 12a5f8f..9828118 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -102,6 +102,7 @@
 #include "virnuma.h"
 #include "dirname.h"
 #include "netdev_bandwidth_conf.h"
+#include "c-ctype.h"
Ahh.. this one's a red flag...  Says to me that something should move to
a util function...

Got.  Header file will be removed (actually I planned to remove all code using it.)

 
 #define VIR_FROM_THIS VIR_FROM_QEMU
 
@@ -19698,6 +19699,230 @@ typedef enum {
 #define HAVE_JOB(flags) ((flags) & QEMU_DOMAIN_STATS_HAVE_JOB)
 
 
+/* In terms of the output of virBitmapFormat, both '1-3' and '1,3' are valid
+ * outputs and represent different vcpu set. But it is not easy to
+ * differentiate these two vcpu set formats at first glance.
+ *
+ * This function could be used to clear this ambiguity, it substitutes all '-'
+ * with ',' while generates semantically correct vcpu set.
+ * e.g. vcpu set string '1-3' will be replaced by string '1,2,3'. */
+static char *
+qemuDomainVcpuFormatHelper(const char *vcpus)
+{
+    size_t i = 0;
+    int last = 0;
+    int start = 0;
+    char * tmp = NULL;
+    bool firstnum = true;
+    const char *cur = vcpus;
+    virBuffer buf = VIR_BUFFER_INITIALIZER;
+    char *ret = NULL;
+
+    if (virStringIsEmpty(cur))
+        return NULL;
+
+    while (*cur != '\0') {
+        if (!c_isdigit(*cur))
Explains the #include, but in the long run, I don't think this method is
worth the effort since all you're doing is printing the format in the
output. Is there other libvirt generated output for cpu stats that does
a similar conversion?  If so, share code, if not drop this.

No need to rearrange the range someone else entered and we've
succesfully parsed in some manner. I think the output should look like
the XML output unless there's some compelling reason to change it which
I don't see.

>From above the:

           cpu.cache.monitor.1.name=vcpus_4-6
           cpu.cache.monitor.1.vcpus=4,5,6
would then be:

           cpu.cache.monitor.1.name=vcpus_4-6
           cpu.cache.monitor.1.vcpus=4-6
right?

Yes, this heavy function is just re-formatting the output of vcpus.
I also think it not worth to do this kind of reformat. Let's remove it.


I don't even want to think about someone who does "7,4-6"...

+            goto cleanup;
+
+        if (virStrToLong_i(cur, &tmp, 10, &start) < 0)
+            goto cleanup;
+        if (start < 0)
+            goto cleanup;
+
+        cur = tmp;
+
+        virSkipSpaces(&cur);
+
+        if (*cur == ',' || *cur == 0) {
+            if (!firstnum)
+                virBufferAddChar(&buf, ',');
+            virBufferAsprintf(&buf, "%d", start);
+            firstnum = false;
+        } else if (*cur == '-') {
+            cur++;
+            virSkipSpaces(&cur);
+
+            if (virStrToLong_i(cur, &tmp, 10, &last) < 0)
+                goto cleanup;
+
+            if (last < start)
+                goto cleanup;
+            cur = tmp;
+
+            for (i = start; i <= last; i++) {
+                if (!firstnum)
+
+                    virBufferAddChar(&buf, ',');
+                virBufferAsprintf(&buf, "%ld", i);
+                firstnum = 0;
+            }
+
+            virSkipSpaces(&cur);
+        }
+
+        if (*cur == ',') {
+            cur++;
+            virSkipSpaces(&cur);
+        } else if (*cur == 0) {
+            break;
+        } else {
+            goto cleanup;
+        }
+    }
+
+    ret = virBufferContentAndReset(&buf);
+ cleanup:
+    virBufferFreeAndReset(&buf);
+    return ret;
+}
+
+
+static int
+qemuDomainGetStatsCpuResource(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
+                              virDomainObjPtr dom,
+                              virDomainStatsRecordPtr record,
+                              int *maxparams,
+                              unsigned int privflags ATTRIBUTE_UNUSED,
+                              virResctrlMonitorType restag)
Why bother passing ATTRIBUTE_UNUSED ?

I just copy-pasted the function header from some other function.
Will be removed since it is an internal function.

+{
+    char param_name[VIR_TYPED_PARAM_FIELD_LENGTH];
+    virDomainResctrlMonDefPtr domresmon = NULL;
+    virResctrlMonitorStatsPtr stats = NULL;
+    size_t nstats = 0;
since these are local to the:

+        for (j = 0; j < resctrl->nmonitors; j++) {

loop and there's local's in that for loop, maybe these two should move
there too...

If the lines with virResctrlMonitorIsRunning could be removed, then this loop is
not necessary any more.

virResctrlMonitorIsRunning is validating monitor through verifying PID lists, but it
seems there is no need to do such kind of validation and the PIDs in *tasks file
will not change without control:
-. The PID in *tasks file will be removed by kernel if vcpu process is terminated,
but the vcpu process will not exit except performing vcpu hot-plug.
-. In process of libvirt re-connection, VM will not  make any changes in terms of
resctrl groups and vcpus, so the content of *tasks will not change and the layout
of resctrl groups will not change.
-. Any changes out of libvirt are not allowed, for example, manually creating or
deleting any resctrl groups by bash commands are not allowed.

I decided to remove the lines invoking virResctrlMonitorIsRunning, and the loop
will be removed.

Actually I will split this function into three functions:
qemuDomainGetCpuResMonitorData
qemuDomainGetStatsCpuResMonitorPerTag
and
qemuDomainGetStatsCPUResMonitor

qemuDomainGetCpuResMonitorData will fetch data such as the monitor group
ID, vcpus, and cache information for all monitors, any error happened in this
function will reported to user through virReportError. So the error will be reported
to user before dumping the final result.

qemuDomainGetStatsCpuResMonitorPerTag is responsible for showing the data
retrieved by a successful call of qemuDomainGetCpuResMonitorData. In this function
it does not dump any error message created by virReportError which will interrupt
the showing of cache monitor information.

qemuDomainGetStatsCPUResMonitor is the interface called by qemuDomainGetStatsCPU.
we could invoke qemuDomainGetStatsCpuResMonitorPerTag  with tag
VIR_RESCTRL_MONITOR_TYPE_CACHE for cache monitor.
memBW monitor result will be processed by invoking this function through specify
tag VIR_RESCTRL_MONITOR_TYPE_MEMBW.

+    virDomainResctrlDefPtr resctrl = NULL;
+    unsigned int nmonitors = 0;
+    unsigned int imonitor = 0;
+    const char *restype = NULL;
+    char *rawvcpus = NULL;
+    char *vcpus = NULL;
+    size_t i = 0;
+    size_t j = 0;
+    int ret = -1;
+
+    if (!virDomainObjIsActive(dom))
+        return 0;
+
+    if (restag == VIR_RESCTRL_MONITOR_TYPE_CACHE) {
+        restype = "cache";
+    } else {
+        VIR_DEBUG("Invalid CPU resource type");
VIR_DEBUG? virReportError would seem to be more appropriate. As I've
stated before there will be an error for some reason generated.

Originally, the purpose of using VIR_DEBUG,not virReportError, to report error
is to avoid showing partial monitor information with an error message on screen.

I will split this function into three functions,
qemuDomainGetCpuResMonitorData
qemuDomainGetStatsCpuResMonitorPerTag
and
qemuDomainGetStatsCPUResMonitor

qemuDomainGetCpuResMonitorData will get necessary data and reported
any error occurred to user promptly. qemuDomainGetStatsCpuResMonitorPerTag and
qemuDomainGetStatsCPUResMonitor will report the cache monitor result.
This will avoid the mix of cache monitor result and error message if error occurs.

Please review my new code.

+        return -1;
+    }
+
+    for (i = 0; i < dom->def->nresctrls; i++) {
+        resctrl = dom->def->resctrls[i];
+
+        for (j = 0; j < resctrl->nmonitors; j++) {
+            domresmon = resctrl->monitors[j];
+            if (virResctrlMonitorIsRunning(domresmon->instance) &&
+                domresmon->tag == restag)
Knowing how heavy wait *IsRunning is, the order of checking should be
reversed at the very least.

virResctrlMonitorIsRunning will not be called any more as I stated. and the code
will be changed, my new code will be more efficient.

+                nmonitors++;
+        }
+    }
+
+    if (nmonitors) {
+        snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
+                 "cpu.%s.monitor.count", restype);
+        if (virTypedParamsAddUInt(&record->params,
+                                  &record->nparams,
+                                  maxparams,
+                                  param_name,
+                                  nmonitors) < 0)
+            goto cleanup;
+    }
All that above to ensure nmonitors == resctrl->nmonitors??? I think
there's a lot of unnecessary stuff.

This part will be removed.

Here the purpose of code is to get the total 'valid' (after a successful check
of virResctrlMonitorIsRunning) monitor groups. It actually performing
the validation of  nmonitors == resctrl->nmonitors, just as you said.
Now I think there is no need to validate the monitor any more.

+
+    for (i = 0; i < dom->def->nresctrls; i++) {
+        resctrl = dom->def->resctrls[i];
+
+        for (j = 0; j < resctrl->nmonitors; j++) {
+            size_t l = 0;
+            virResctrlMonitorPtr monitor = resctrl->monitors[j]->instance;
+            const char *id = virResctrlMonitorGetID(monitor);
+
+            if (!id)
+                goto cleanup;
I would think you have other problems in this case. Besides there's no
error message associated with this.

Error message should be reported. And will be reported. Please review the new code.


      
+
+            domresmon = resctrl->monitors[j];
+
+            if (!virResctrlMonitorIsRunning(domresmon->instance))
+                continue;
Oh my one call for each iteration, ouch - highly inefficient.
virResctrlMonitorIsRunning disappears in code and patch for this file
will be redesigned. Please renew the new code.

+
+            if (!(rawvcpus = virBitmapFormat(domresmon->vcpus)))
+                goto cleanup;
+
+            vcpus = qemuDomainVcpuFormatHelper(rawvcpus);
+            if (!vcpus)
+                goto cleanup;
+
+            if (virResctrlMonitorGetCacheOccupancy(monitor, &stats,
+                                                   &nstats) < 0)
+                goto cleanup;
+
+            snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
+                     "cpu.%s.monitor.%d.name", restype, imonitor);
+            if (virTypedParamsAddString(&record->params,
+                                        &record->nparams,
+                                        maxparams,
+                                        param_name,
+                                        id) < 0)
+                goto cleanup;
+
+            snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
+                     "cpu.%s.monitor.%d.vcpus", restype, imonitor);
+
+            if (virTypedParamsAddString(&record->params,
+                                        &record->nparams,
+                                        maxparams,
+                                        param_name,
+                                        vcpus) < 0)
+                goto cleanup;
+
+            snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
+                     "cpu.%s.monitor.%d.bank.count", restype, imonitor);
+            if (virTypedParamsAddUInt(&record->params,
+                                      &record->nparams,
+                                      maxparams,
+                                      param_name,
+                                      nstats) < 0)
+                goto cleanup;
+
+            for (l = 0; l < nstats; l++) {
+                snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
+                         "cpu.%s.monitor.%d.bank.%ld.id",
+                         restype, imonitor, l);
+                if (virTypedParamsAddUInt(&record->params,
+                                          &record->nparams,
+                                          maxparams,
+                                          param_name,
+                                          stats[l].id) < 0)
+                    goto cleanup;
+
+
+                snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
+                         "cpu.%s.monitor.%d.bank.%ld.bytes",
+                         restype, imonitor, l);
+                if (virTypedParamsAddUInt(&record->params,
+                                          &record->nparams,
+                                          maxparams,
+                                          param_name,
+                                          stats[l].val) < 0)
+                    goto cleanup;
+            }
+
+            VIR_FREE(stats);
               nstats = 0;
If virResctrlMonitorGetCacheOccupancy returns without error, nstats will be assigned with
the proper bank group number. If virResctrlMonitorGetCacheOccupancy returns an error,
then code will go through cleanup path and returns.
Seems no need to clear nstats to 0.

Any way in my new code, this nstats will not be used any more, please review the logic of
new code later.

too!

+            VIR_FREE(vcpus);
+            imonitor++;
+        }
+    }
+
+    ret = 0;
+ cleanup:
+    VIR_FREE(vcpus);
+    return ret;
+}
+
+
 static int
 qemuDomainGetStatsCpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
                       virDomainObjPtr dom,
@@ -19735,6 +19960,10 @@ qemuDomainGetStatsCpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
             return -1;
     }
 
+    if (qemuDomainGetStatsCpuResource(driver, dom, record, maxparams, privflags,
+                                      VIR_RESCTRL_MONITOR_TYPE_CACHE) < 0)
+        return -1;
+
     return 0;
 }
 
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c
index fa3e6e9..02e7095 100644
--- a/src/util/virresctrl.c
+++ b/src/util/virresctrl.c
@@ -2713,3 +2713,133 @@ virResctrlMonitorIsRunning(virResctrlMonitorPtr monitor)
 
     return ret;
 }
+
+
+static int
+virResctrlMonitorStatsSorter(const void *a,
+                             const void *b)
+{
+    return ((virResctrlMonitorStatsPtr)a)->id
+        - ((virResctrlMonitorStatsPtr)b)->id;
+}
+
+
+/*
+ * virResctrlMonitorGetStats
+ *
+ * @monitor: The monitor that the statistic data will be retrieved from.
+ * @resource: The name for resource name. 'llc_occupancy' for cache resource.
+ * "mbm_total_bytes" and "mbm_local_bytes" for memory bandwidth resource.
+ * @stats: Array of virResctrlMonitorStatsPtr for holding cache or memory
+ * bandwidth usage data.
+ * @nstats: A size_t pointer to hold the returned array length of @stats
+ *
+ * Get cache or memory bandwidth utilization information.
+ *
+ * Returns 0 on success, -1 on error.
+ */
+static int
+virResctrlMonitorGetStats(virResctrlMonitorPtr monitor,
+                          const char *resource,
+                          virResctrlMonitorStatsPtr *stats,
+                          size_t *nstats)
+{
+    int rv = -1;
+    int ret = -1;
+    DIR *dirp = NULL;
+    char *datapath = NULL;
+    struct dirent *ent = NULL;
+    virResctrlMonitorStatsPtr stat = NULL;
+
+    if (!monitor) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("Invalid resctrl monitor"));
+        return -1;
+    }
+
+    if (virAsprintf(&datapath, "%s/mon_data", monitor->path) < 0)
+        return -1;
+
+    if (virDirOpen(&dirp, datapath) < 0)
+        goto cleanup;
+
+    *nstats = 0;
+    while (virDirRead(dirp, &ent, datapath) > 0) {
+        char *node_id = NULL;
+
+        if (VIR_ALLOC(stat) < 0)
+            goto cleanup;
+
+        /* Looking for directory that contains resource utilization
+         * information file. The directory name is arranged in format
+         * "mon_<node_name>_<node_id>". For example, "mon_L3_00" and
+         * "mon_L3_01" are two target directories for a two nodes system
+         * with resource utilization data file for each node respectively.
+         */
+        if (ent->d_type != DT_DIR)
+            continue;
+
+        /* Looking for directory has a prefix 'mon_L' */
+        if (!(node_id = STRSKIP(ent->d_name, "mon_L")))
+            continue;
+
+        /* Looking for directory has another '_' */
+        node_id = strchr(node_id, '_');
+        if (!node_id)
+            continue;
+
+        /* Skip the character '_' */
+        if (!(node_id = STRSKIP(node_id, "_")))
+            continue;
+
+        /* The node ID number should be here, parsing it. */
+        if (virStrToLong_uip(node_id, NULL, 0, &stat->id) < 0)
+            goto cleanup;
+
+        rv = virFileReadValueUint(&stat->val, "%s/%s/%s", datapath,
+                                  ent->d_name, resource);
+        if (rv == -2) {
+            virReportError(VIR_ERR_INTERNAL_ERROR,
+                           _("File '%s/%s/%s' does not exist."),
+                           datapath, ent->d_name, resource);
+        }
+        if (rv < 0)
+            goto cleanup;
+
+        if (VIR_APPEND_ELEMENT(*stats, *nstats, *stat) < 0)
+            goto cleanup;
+    }
+
+    /* Sort in id's ascending order */
+    qsort(*stats, *nstats, sizeof(*stat), virResctrlMonitorStatsSorter);
Coverity notes that *stats could be NULL - if the above while loop gets
nothing...

Will be changed to
if (*nstats)
    qsort(*stats, *nstats, sizeof(*stat), virResctrlMonitorStatsSorter);

+
+    ret = 0;
+ cleanup:
+    VIR_FREE(datapath);
+    VIR_FREE(stat);
+    VIR_DIR_CLOSE(dirp);
+    return ret;
+}
+
+
+/*
+ * virResctrlMonitorGetCacheOccupancy
+ *
+ * @monitor: The monitor that the statistic data will be retrieved from.
+ * @caches: Array of virResctrlMonitorStatsPtr for receiving cache occupancy
+ * data. Caller is responsible to free this array.
+ * @ncaches: A size_t pointer to hold the returned array length of @caches
They're @stats and @nstats elsewhere, be consistent.

Will use @stats and @nstats for virResctrlMonitorGetCacheOccupancy and later
API (for memBW).

+ *
+ * Get cache or memory bandwidth utilization information.
+ *
+ * Returns 0 on success, -1 on error.
+ */
+
+int
+virResctrlMonitorGetCacheOccupancy(virResctrlMonitorPtr monitor,
+                                   virResctrlMonitorStatsPtr *caches,
+                                   size_t *ncaches)
They're @stats and @nstats elsewhere, be consistent.

Will use @stats and @nstats for parameters.

+{
+    return virResctrlMonitorGetStats(monitor, "llc_occupancy",
+                                     caches, ncaches);
+}
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h
index 8d8fdc2..004c40e 100644
--- a/src/util/virresctrl.h
+++ b/src/util/virresctrl.h
@@ -191,6 +191,13 @@ virResctrlInfoGetMonitorPrefix(virResctrlInfoPtr resctrl,
 typedef struct _virResctrlMonitor virResctrlMonitor;
 typedef virResctrlMonitor *virResctrlMonitorPtr;
 
+typedef struct _virResctrlMonitorStats virResctrlMonitorStats;
+typedef virResctrlMonitorStats *virResctrlMonitorStatsPtr;
+struct _virResctrlMonitorStats {
+    unsigned int id;
+    unsigned int val;
+};
+
 virResctrlMonitorPtr
 virResctrlMonitorNew(void);
 
@@ -222,4 +229,9 @@ virResctrlMonitorRemove(virResctrlMonitorPtr monitor);
 
 bool
 virResctrlMonitorIsRunning(virResctrlMonitorPtr monitor);
+
+int
+virResctrlMonitorGetCacheOccupancy(virResctrlMonitorPtr monitor,
+                                   virResctrlMonitorStatsPtr *caches,
+                                   size_t *ncaches);
They're @stats and @nstats elsewhere, be consistent.

Will use @stats and @nstats for parameters.


      
 #endif /*  __VIR_RESCTRL_H__ */

I'm sure there's more I'll pick up later, but you should have plenty for
now. As to splitting - I think some splitting could be done, I'll leave
it up to you... Lots of change across multiple areas - ask yourself -
can something be separated out?

The virresctrl.* part could be separated, and how about  merging this part with previous patch
that adding more monitor interfaces in virresctrl.*.
I'll do this in my next update for your review.

John


Thanks for review.
Huaqiang
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux