On 07/21/2011 06:01 AM, Daniel P. Berrange wrote: > On Thu, Jul 21, 2011 at 10:11:32AM +0800, Wen Congyang wrote: >> @@ -5851,7 +5953,6 @@ static int qemuSetSchedulerParametersFlags(virDomainPtr dom, >> virTypedParameterPtr param = ¶ms[i]; >> >> if (STREQ(param->field, "cpu_shares")) { >> - int rc; >> if (param->type != VIR_TYPED_PARAM_ULLONG) { >> qemuReportError(VIR_ERR_INVALID_ARG, "%s", >> _("invalid type for cpu_shares tunable, expected a 'ullong'")); >> @@ -5870,19 +5971,47 @@ static int qemuSetSchedulerParametersFlags(virDomainPtr dom, >> } >> >> if (flags & VIR_DOMAIN_AFFECT_CONFIG) { >> - persistentDef = virDomainObjGetPersistentDef(driver->caps, vm); >> - if (!persistentDef) { >> - qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", >> - _("can't get persistentDef")); >> + vmdef->cputune.shares = params[i].value.ul; >> + } >> + } else if (STREQ(param->field, "cfs_period")) { > > [snip] > >> + } else if (STREQ(param->field, "cfs_quota")) { > > > In the XML file we now have > > <cputune> > <shares>1024</shares> > <period>90000</period> > <quota>0</quota> > </cputune> > > > But the schedinfo parameter are being named > > cpu_shares: 1024 > cfs_period: 90000 > cfs_quota: 0 > > These new tunables should be named 'cpu_period' and 'cpu_quota' too. You mean 'cfs_period' and 'cfs_quota', right? The only problem I see with that naming (specifically for the quota) is that it implies a direct relationship to the tunable in cgroupfs. While true for 'cpu_shares', it is not true for quota since the proper setting to apply depends on the threading behavior of qemu. >> +static int >> +qemuGetVcpusBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, >> + unsigned long long *period, long long *quota) >> +{ >> + virCgroupPtr cgroup_vcpu = NULL; >> + qemuDomainObjPrivatePtr priv = NULL; >> + int rc; >> + int ret = -1; >> + >> + priv = vm->privateData; >> + if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { >> + /* We do not create sub dir for each vcpu */ >> + rc = qemuGetVcpuBWLive(cgroup, period, quota); >> + if (rc < 0) >> + goto cleanup; >> + >> + if (*quota > 0) >> + *quota /= vm->def->vcpus; >> + goto out; >> + } >> + >> + /* get period and quota for vcpu0 */ >> + rc = virCgroupForVcpu(cgroup, 0, &cgroup_vcpu, 0); >> + if (!cgroup_vcpu) { >> + virReportSystemError(-rc, >> + _("Unable to find vcpu cgroup for %s(vcpu: 0)"), >> + vm->def->name); >> + goto cleanup; >> + } >> + >> + rc = qemuGetVcpuBWLive(cgroup_vcpu, period, quota); >> + if (rc < 0) >> + goto cleanup; > > This is also bad IMHO, giving different semantics for the operation > of the API, depending on whether the QEMU impl has VCPUs threads or > not. Again this just says to me that we need this to apply at the VM > level not the VCPU level. IMO, this firms up the semantics of the libvirt API. As these patches are written, the libvirt semantics are: "Apply cpu capping to this domain. Period is the measurement interval. Quota is the amount of time each vcpu may run within the period." We are able to insulate the users from details of the underlying qemu -- whether it supports threads or not. Another reason a per-vcpu quota setting is more desirable than a global setting is seen when changing the number of vcpus assigned to a domain. Each time you change the vcpu count, you would have to change the quota number too. Imagine all of the complaints from users when they increase the vcpu count and find their domain actually performs worse. > If we really do need finer per-VCPU controls, in addition to a per-VM > control, I think that should likely be done as a separate tunable, > or separate CPU > > eg, we could have 2 sets of tunables, one that applies to the VM and > the second set that adds further controls to the VCPUs. How would this behave when qemu doesn't support vcpu threads? We'd either have to throw an error, or just ignore the vcpu_ settings. either way, you're exposing users to qemu internals. They would need to change their domain.xml files when moving from a multi-threaded qemu to a single-threaded qemu. > <cputune> > <shares>1024</shares> > <period>90000</period> > <quota>0</quota> > > <vcpu_shares>1024</vcpu_shares> > <vcpu_period>90000</vcpu_period> > <vcpu_quota>0</vcpu_quota> > </cputune> I actually do like the <vcpu_*> naming because it makes two things more clear: Units apply at the vcpu level, and these stats don't necessarily map directly to cgroupfs. However, I would still maintain that the current logic should be used for applying the settings (per-vcpu if supported, multiply by nvcpus and apply globally if vcpu threads are not supported). -- Adam Litke IBM Linux Technology Center -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list