On 09/14/2018 03:36 PM, Lukas Hejtmanek wrote: > Hello, > > ok, I found that cpu pinning was wrong, so I corrected it to be 1:1. The issue > with iozone remains the same. > > The spec is running, however, it runs slower than 1-NUMA case. > > The corrected XML looks like follows: [Reformated XML for better reading] <cpu mode="host-passthrough"> <topology sockets="8" cores="4" threads="1"/> <numa> <cell cpus="0-3" memory="62000000"/> <cell cpus="4-7" memory="62000000"/> <cell cpus="8-11" memory="62000000"/> <cell cpus="12-15" memory="62000000"/> <cell cpus="16-19" memory="62000000"/> <cell cpus="20-23" memory="62000000"/> <cell cpus="24-27" memory="62000000"/> <cell cpus="28-31" memory="62000000"/> </numa> </cpu> <cputune> <vcpupin vcpu="0" cpuset="0"/> <vcpupin vcpu="1" cpuset="1"/> <vcpupin vcpu="2" cpuset="2"/> <vcpupin vcpu="3" cpuset="3"/> <vcpupin vcpu="4" cpuset="4"/> <vcpupin vcpu="5" cpuset="5"/> <vcpupin vcpu="6" cpuset="6"/> <vcpupin vcpu="7" cpuset="7"/> <vcpupin vcpu="8" cpuset="8"/> <vcpupin vcpu="9" cpuset="9"/> <vcpupin vcpu="10" cpuset="10"/> <vcpupin vcpu="11" cpuset="11"/> <vcpupin vcpu="12" cpuset="12"/> <vcpupin vcpu="13" cpuset="13"/> <vcpupin vcpu="14" cpuset="14"/> <vcpupin vcpu="15" cpuset="15"/> <vcpupin vcpu="16" cpuset="16"/> <vcpupin vcpu="17" cpuset="17"/> <vcpupin vcpu="18" cpuset="18"/> <vcpupin vcpu="19" cpuset="19"/> <vcpupin vcpu="20" cpuset="20"/> <vcpupin vcpu="21" cpuset="21"/> <vcpupin vcpu="22" cpuset="22"/> <vcpupin vcpu="23" cpuset="23"/> <vcpupin vcpu="24" cpuset="24"/> <vcpupin vcpu="25" cpuset="25"/> <vcpupin vcpu="26" cpuset="26"/> <vcpupin vcpu="27" cpuset="27"/> <vcpupin vcpu="28" cpuset="28"/> <vcpupin vcpu="29" cpuset="29"/> <vcpupin vcpu="30" cpuset="30"/> <vcpupin vcpu="31" cpuset="31"/> </cputune> <numatune> <memory mode="strict" nodeset="0-7"/> </numatune> However, this is not enough. This XML pins only vCPUs and not guest memory. So while say vCPU #0 is pinned onto physical CPU #0, the memory for guest NUMA #0 might be allocated at host NUMA #7 (for instance). You need to add: <numatune> <memnode cellid="0" mode="strict" nodeset="0"/> <memnode cellid="1" mode="strict" nodeset="1"/> ... </numatune> This will ensure also the guest memory pinning. But wait, there is more. In your later e-mails you mention slow disk I/O. This might be caused by various variables but the most obvious one in this case is qemu I/O loop, I'd say. Without iothreads qemu has only one I/O loop and thus if your guest issues writes from all 32 cores at once this loop is unable to handle it (performance wise) and therefore the performance drop. You can try enabling iothreads: https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation This is a qemu feature that allows you to create more I/O threads and also pin them. This is an example how to use them: https://libvirt.org/git/?p=libvirt.git;a=blob;f=tests/qemuxml2argvdata/iothreads-disk.xml;h=0aa32c392300c0a86ad26185292ebc7a0d85d588;hb=HEAD And this is an example how to pin them: https://libvirt.org/git/?p=libvirt.git;a=blob;f=tests/qemuxml2argvdata/cputune-iothreads.xml;h=311a1d3604177d9699edf7132a75f387aa57ad6f;hb=HEAD Also, since iothreads are capable of handling just any I/O they can be used for other devices too, not only disks. For instance interfaces. Hopefully, this will boost your performance. Regards, Michal (who is a bit envious about your machine :-P) _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users