On Mon, Dec 05, 2016 at 04:12:22PM +0000, Feng, Shaohe wrote: > Hi all: > > As we are know Intel® Xeon phi targets high-performance computing and other > parallel workloads. > Now qemu has supported phi virtualization,it is time for libvirt to support > phi. Can you provide pointer to the relevant QEMU changes. > Different from the traditional X86 server, There is a special numa node with > Multi-Channel DRAM (MCDRAM) on Phi, but without any CPU . > > Now libvirt requires nonempty cpus argument for NUMA node, such as. > <numa> > <cell id='0' cpus='0-239' memory='80' unit='GiB'/> > <cell id='1' cpus='240-243' memory='16' unit='GiB'/> > </numa> > > In order to support phi virtualization, libvirt needs to allow a numa cell > definition without 'cpu' attribution. > > Such as: > <numa> > <cell id='0' cpus='0-239' memory='80' unit='GiB'/> > <cell id='1' memory='16' unit='GiB'/> > </numa> > > When a cell without 'cpu', qemu will allocate memory by default MCDRAM instead of DDR. There's separate concepts at play which your description here is mixing up. First is the question of whether the guest NUMA node can be created with only RAM or CPUs, or a mix of both. Second is the question of what kind of host RAM (MCDRAM vs DDR) is used as the backing store for the guest These are separate configuration items which don't need to be conflated in libvirt. ie we should be able to create a guest with a node containing only memory, and back that by DDR on the host. Conversely we should be able to create a guest with a node containing memory + cpus and back that by MCDRAM on the host (even if that means the vCPUs will end up on a different host node from its RAM) On the first point, there still appears to be some brokness in either QEMU or Linux wrt configuration of virtual NUMA where either cpus or memory are absent from nodes. eg if I launch QEMU with -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 -numa node,nodeid=2,cpus=4-7 -numa node,nodeid=3,mem=512 -numa node,nodeid=4,mem=512 -numa node,nodeid=5,cpus=8-11 -numa node,nodeid=6,mem=1024 -numa node,nodeid=7,cpus=12-15,mem=1024 then the guest reports # numactl --hardware available: 6 nodes (0,3-7) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 node 0 size: 487 MB node 0 free: 230 MB node 3 cpus: 12 13 14 15 node 3 size: 1006 MB node 3 free: 764 MB node 4 cpus: node 4 size: 503 MB node 4 free: 498 MB node 5 cpus: node 5 size: 503 MB node 5 free: 499 MB node 6 cpus: node 6 size: 503 MB node 6 free: 498 MB node 7 cpus: node 7 size: 943 MB node 7 free: 939 MB so its pushed all the CPUs from nodes without RAM into the first node, and moved CPUs from the 7th node into the 3rd node. So before considering MCDRAM / Phi, we need to fix this more basic NUMA topology setup. > Now here I'd like to discuss these questions: > 1. This feature is only for Phi at present, but we > will check Phi platform for CPU-less NUMA node. > The NUMA node without CPU indicates MCDRAM node. We should not assume such semantics - it is a concept that is specific to particular Intel x86_64 CPUs. We need to consider that other architectures may have nodes without CPUs that are backed by normal DDR. IOW, we shoud be explicit about presence of MCDRAM in the host. > And for now MCDRAM is available only for PHI. > However, there is no reason that any other platform > couldn’t define CPU-less NUMA node using libvirt, so > there is no reason to check if PHI is used or not. > 2. Type of memory of CPU-less NUMA node will not be > checked during machine creation/configuration step. > There is no reliable way to distinguish if the node > is MCDRAM or regular DDR. This step is not concerned > with type of the memory, only with NUMA assignment. If we can't distinguish MCDRAM from DDR that's a problem for apps, given your next point about MCDRAM not supporting over commit. > 3. Unlike traditional memory assign to a VM, MCDRAM do not > support over commit > If the memory of a virtual NUMA node is going to be > explicitly bound to physical NUMA node then it shouldn’t > exceed the size of its corresponding NUMA node, doesn’t > matter if it is MCDRAM or DDR. It is valid to bind guests to NUMA nodes and still have memory over commit, so we do need to know if a host node is using MCDRAM or DDR, so apps can determine whether that node supports over commit or not. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list