On Thu, Jul 21, 2011 at 08:20:25AM -0500, Adam Litke wrote: > > > On 07/21/2011 06:01 AM, Daniel P. Berrange wrote: > > On Thu, Jul 21, 2011 at 10:11:32AM +0800, Wen Congyang wrote: > >> @@ -5851,7 +5953,6 @@ static int qemuSetSchedulerParametersFlags(virDomainPtr dom, > >> virTypedParameterPtr param = ¶ms[i]; > >> > >> if (STREQ(param->field, "cpu_shares")) { > >> - int rc; > >> if (param->type != VIR_TYPED_PARAM_ULLONG) { > >> qemuReportError(VIR_ERR_INVALID_ARG, "%s", > >> _("invalid type for cpu_shares tunable, expected a 'ullong'")); > >> @@ -5870,19 +5971,47 @@ static int qemuSetSchedulerParametersFlags(virDomainPtr dom, > >> } > >> > >> if (flags & VIR_DOMAIN_AFFECT_CONFIG) { > >> - persistentDef = virDomainObjGetPersistentDef(driver->caps, vm); > >> - if (!persistentDef) { > >> - qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", > >> - _("can't get persistentDef")); > >> + vmdef->cputune.shares = params[i].value.ul; > >> + } > >> + } else if (STREQ(param->field, "cfs_period")) { > > > > [snip] > > > >> + } else if (STREQ(param->field, "cfs_quota")) { > > > > > > In the XML file we now have > > > > <cputune> > > <shares>1024</shares> > > <period>90000</period> > > <quota>0</quota> > > </cputune> > > > > > > But the schedinfo parameter are being named > > > > cpu_shares: 1024 > > cfs_period: 90000 > > cfs_quota: 0 > > > > These new tunables should be named 'cpu_period' and 'cpu_quota' too. > > You mean 'cfs_period' and 'cfs_quota', right? No, I we should *not* be using 'cfs_' in the public interface since this is just an internal impl, of the current Linux schedular. If Linux gains a different schedular in the future which can still provide period/quota caps, the naming 'cfs_period' and 'cfs_quota' will be bad. The XML already has this correct, by not using 'cfs_' in the XML naming. The schedular info parameters are wrong, since they have go the 'cfs_' prefix. > >> + priv = vm->privateData; > >> + if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { > >> + /* We do not create sub dir for each vcpu */ > >> + rc = qemuGetVcpuBWLive(cgroup, period, quota); > >> + if (rc < 0) > >> + goto cleanup; > >> + > >> + if (*quota > 0) > >> + *quota /= vm->def->vcpus; > >> + goto out; > >> + } > >> + > >> + /* get period and quota for vcpu0 */ > >> + rc = virCgroupForVcpu(cgroup, 0, &cgroup_vcpu, 0); > >> + if (!cgroup_vcpu) { > >> + virReportSystemError(-rc, > >> + _("Unable to find vcpu cgroup for %s(vcpu: 0)"), > >> + vm->def->name); > >> + goto cleanup; > >> + } > >> + > >> + rc = qemuGetVcpuBWLive(cgroup_vcpu, period, quota); > >> + if (rc < 0) > >> + goto cleanup; > > > > This is also bad IMHO, giving different semantics for the operation > > of the API, depending on whether the QEMU impl has VCPUs threads or > > not. Again this just says to me that we need this to apply at the VM > > level not the VCPU level. > > IMO, this firms up the semantics of the libvirt API. As these patches > are written, the libvirt semantics are: "Apply cpu capping to this > domain. Period is the measurement interval. Quota is the amount of > time each vcpu may run within the period." We are able to insulate the > users from details of the underlying qemu -- whether it supports threads > or not. > > Another reason a per-vcpu quota setting is more desirable than a global > setting is seen when changing the number of vcpus assigned to a domain. > Each time you change the vcpu count, you would have to change the quota > number too. Imagine all of the complaints from users when they increase > the vcpu count and find their domain actually performs worse. I think it is possible to argue that last point either way really, and we in fact already have that behaviour with the existing cpu_shares parameter. Having different behaviour for these new parameters, vs cpu_shares is an even worse problem. Since there is no clear cut answer to your last point either way, this again suggests to me that we should have two sets of CPU tunables here, one applying to the whole VM and other to the per-VCPUs (if supported). > > If we really do need finer per-VCPU controls, in addition to a per-VM > > control, I think that should likely be done as a separate tunable, > > or separate CPU > > > > eg, we could have 2 sets of tunables, one that applies to the VM and > > the second set that adds further controls to the VCPUs. > > How would this behave when qemu doesn't support vcpu threads? We'd > either have to throw an error, or just ignore the vcpu_ settings. > either way, you're exposing users to qemu internals. They would need to > change their domain.xml files when moving from a multi-threaded qemu to > a single-threaded qemu. If the user requests a per-VCPU control and we can't support that we should return an error to the user, rather than silently implementing a different type of control. This doesn't imply that we've exposing users to QEMU internals. We can simply telling them that we have been unable to support the configuration that they requested. This is good, because knowing that the config could not be supported means they have the knowledge to be able to apply a different policy. With this current impl users have no idea we are secretly applying different semantics and limiting the whole VM. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list