On 09/17/2018 04:59 PM, Lukas Hejtmanek wrote: > Hello, > > so the current domain configuration: > <cpu mode='host-passthrough'><topology sockets='8' cores='4' threads='1'/><numa><cell cpus='0-3' memory='62000000' /><cell cpus='4-7' memory='62000000' /><cell cpus='8-11' memory='62000000' /><cell cpus='12-15' memory='62000000' /><cell cpus='16-19' memory='62000000' /><cell cpus='20-23' memory='62000000' /><cell cpus='24-27' memory='62000000' /><cell cpus='28-31' memory='62000000' /></numa></cpu> > <cputune><vcpupin vcpu='0' cpuset='0' /><vcpupin vcpu='1' cpuset='1' /><vcpupin vcpu='2' cpuset='2' /><vcpupin vcpu='3' cpuset='3' /><vcpupin vcpu='4' cpuset='4' /><vcpupin vcpu='5' cpuset='5' /><vcpupin vcpu='6' cpuset='6' /><vcpupin vcpu='7' cpuset='7' /><vcpupin vcpu='8' cpuset='8' /><vcpupin vcpu='9' cpuset='9' /><vcpupin vcpu='10' cpuset='10' /><vcpupin vcpu='11' cpuset='11' /><vcpupin vcpu='12' cpuset='12' /><vcpupin vcpu='13' cpuset='13' /><vcpupin vcpu='14' cpuset='14' /><vcpupin vcpu='15' cpuset='15' /><vcpupin vcpu='16' cpuset='16' /><vcpupin vcpu='17' cpuset='17' /><vcpupin vcpu='18' cpuset='18' /><vcpupin vcpu='19' cpuset='19' /><vcpupin vcpu='20' cpuset='20' /><vcpupin vcpu='21' cpuset='21' /><vcpupin vcpu='22' cpuset='22' /><vcpupin vcpu='23' cpuset='23' /><vcpupin vcpu='24' cpuset='24' /><vcpupin vcpu='25' cpuset='25' /><vcpupin vcpu='26' cpuset='26' /><vcpupin vcpu='27' cpuset='27' /><vcpupin vcpu='28' cpuset='28' /><vcpupin vcpu='29' cpuset='29' /><vcpupin vcpu='30 ' cpuset='30' /><vcpupin vcpu='31' cpuset='31' /></cputune> > <numatune> > <memnode cellid="0" mode="strict" nodeset="0"/> > <memnode cellid="1" mode="strict" nodeset="1"/> > <memnode cellid="2" mode="strict" nodeset="2"/> > <memnode cellid="3" mode="strict" nodeset="3"/> > <memnode cellid="4" mode="strict" nodeset="4"/> > <memnode cellid="5" mode="strict" nodeset="5"/> > <memnode cellid="6" mode="strict" nodeset="6"/> > <memnode cellid="7" mode="strict" nodeset="7"/> > </numatune> > > hopefully, I got it right. Yes, looking good. > > Good news is, that spec benchmark looks promising. The first test bwaves > finished in 1003 seconds compared to 1700 seconds in the previous wrong case. > So far so good. Very well, this means that the config above is correct. > > Bad news is, that iozone is still the same. There might be some > misunderstanding. > > I have to cases: > > 1) cache=unsafe. In this case, I can see that hypervizor is prone to swap. > Swap a lot. It usually eats whole swap partition and kswapd is running on 100% > CPU. swappines, dirty_ration and company do not improve things at all. > However, I believe, this is just wrong option for scratch disks where one can > expect huge I/O load. Moreover, the hypevizor is poor machine with only low > memory left (ok, in my case about 10GB available), so it does not make sense > to use that memory for additional cache/disk buffers. One thing that just occurred to me - is the qcow2 file fully allocated? # qemu-img info /var/lib/libvirt/images/fedora.qcow2 .. virtual size: 20G (21474836480 bytes) disk size: 7.0G .. This is NOT a fully allocated qcow2. > > 2) cache=none. In this case, performance is better (only few percent behind > baremetal). However, as soon as the size of stored data is about the size of > memory of the virtual, writes stops and iozone is eating whole CPU, it looks like > it is searching more free pages and it is harder and harder. But not sure, > I am not skilled in this area. Hmm. Could it be that SSD doesn't have enough free blocks and thus writes are throttled? Can you fstrim it and see if that helps? > > here, you can clearly see, that it starts writes, doing the writes, then it > takes a pause, writes again, and so on, but the pauses are longer and longer.. > https://pastebin.com/2gfPFgb9 > The output is until the very end of iozone (I cancelled it by ctrl-c). > > It seems that this is not happening on 2-NUMA node with rotational disks only. > It is partly happening on 2-NUMA node with 2 NVME SSDs. The partly means, that > there are also pauses in writes but it finishes, speed is reduced though. On > 1-NUMA node, with the same test, I can see steady writes from the very > beginning to the very end at roughly the same speed. > > Maybe it could be related to the fact, that NVME is PCI device that is linked > to one NUMA node only? Can be. I don't know qemu internals that much to know if its capable of doing zero copy disk writes. > > > As of iothreads, I have only 1 disk (the vde) that is exposed to high i/o > load, so I believe more I/O threads is not applicable here. If I understand > correctly, I cannot set more iothreads to a single device.. And it does not > seem to be iothreads linked as the same scenario in 1-NUMA configuration works > OK (I mean that memory penalties can be huge as it does not reflect real NUMA > topology, but disk speed it ok anyway.) Ah, since it's only one disk then iothreads will not help much here. Still worth giving it a shot ;-) Remember, iothreads are for all I/O, not disk I/O only. Anyway, this is the point where I have to say "I don't know". Sorry. Try contacting qemu guys: qemu-discuss@xxxxxxxxxx qemu-devel@xxxxxxxxxx Michal _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users