* Daniel Veillard <veillard@xxxxxxxxxx> [2007-10-11 08:01]: > There are a few things I gathered on this issue. This affects > NUMA setups, where basically if a domain must be placed on a given cell > it is not good to let the hypervisor place it first with its own heuristics > and then later migrate it to a different set of CPU, but better to > instruct the hypervisor to start said domain on the given set. > - For Xen it is possible to instruct the hypervisor by passing > (cpus '2,3') in the SExpr where the argument is a list of > the physical processors allowed A bit more detail here just FYI: Xen takes the cpu list and converts that into an affinity bitmap that is then applied to each vcpu allocated to the guest. > - For KVM I think the standard way would be to select the > cpuset using sched_setaffinity() between the fork of the > current process and the exec of the qemu process Yep. > - there is no need (from a NUMA perspective) to do fine grained > allocation at that point, as long as the domain can be restricted > to a given cell at startup, then if needed virDomainPinVcpu() can be > used later to do more precise pinning in order to try to optimize > placement kvm-46 added user-space allocated memory which means that we can use libnuma/numactl to set the approriate node. > - to be able to instruct the hypervisor at creation time adding the > information in the domain XML description looks the more natural way > (another option would be to force to use virDomainDefineXML, add a > call using the resulting virDomainPtr to define the set, and > then virDomainCreate would be used to do the actual start) > + the good point of having this embedded in the XML is that > we still have all informations about the domain settings in > the XML, if we want to restart it later > + the bad point is that we need to fetch and carry this extra > information when doing XML dumps to not loose it for example > when manipulating the domain to add or remove devices > - extracting a cpuset can still be an heavy operation, for example > if using xend on need one RPC per vcpu in the domain, the cpuset > being constructed by OR'ing logically all cpumaps used by the > vcpus of the domain (though in most case this will be the full > map after the first CPU and can be stopped immediately) Yeah, that might be a decent patch to xend - build up an array of affinity masks for each vcpu. > - for the mapping at the XML level I suggest to use a simple extension > to the <vcpu>n</vcpu> and extend it to > <vcpu cpuset='2,3'>n</vcpu> > with a limited syntax which is just the comma separated list of > allowed CPU numbers (if the code actually detects such a cpuset is > in effect i.e. in general this won't be added). I think we should support the same cpuset notation that Xen supports, which means including ranges (1-4) and negation (^1). These two features make describing large ranges much more compact. > > Internally implementing this should not be too hard, I would probably refactor > some of the existing parsing code, provide functions to get the cpuset and > the number of physical processors. > > Does this sounds okay ? Yeah, I think this covers everything we'd need. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@xxxxxxxxxx -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list