On Sun, 2018-11-04 at 11:40 -0500, Vitaly Mayatskih wrote: > On Sun, Nov 4, 2018 at 6:57 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > > Hi! > > I am also working in this area, and so I am very intersted in this driver. > > > > > 1 171k 151k 148k 151k 195k 187k 175k > > > > If I understand correctly this is fio --numjobs=1? > > It looks like you are getting better that native performance over bare metal > > in > > E,F,G (vhost-blk cases in fact). Is this correct? > > Yes. At such speeds it is a matter of how the workers are scheduled, > i.e. how good is batching. There are other factors why vhost-blk is on > par or slightly higher than fio running in userspace on bare metal, > but from my observation the right batching all through the stack is > more important. I completely agree with you on that. I currently learning profiling/tracing to understand the batching (or lack of) in the tests I run My focus currently is mostly on spdk + native nvme While for multiple threads, the performance is very close to bare metal, on single thread I see significant overhead which probably relate to batching as well. > > > Could you share the full fio command line you have used? > > sysctl -w vm.nr_hugepages=8300; numactl -p 1 -N 1 ./qemu-system-x86_64 > -enable-kvm -cpu host -smp 16 -mem-prealloc -mem-path > /dev/hugepages/foo -m 8G -nographic -drive > if=none,id=drive0,format=raw,file=/dev/mapper/mirror-hello,cache=none > -device virtio-blk-pci,id=blk0,drive=drive0,num-queues=16 -drive > if=none,id=drive1,format=raw,file=/dev/mapper/mirror-volume,cache=none > -device vhost-blk-pci,id=blk1,drive=drive1,num-queues=16 Thanks! > > for i in `seq 1 16`; do echo -n "$i "; ./fio --direct=1 --rw=randread > --ioengine=libaio --bs=4k --iodepth=128 --numjobs=$i --name=foo > --time_based --runtime=15 --group_reporting --filename=/dev/vda > --size=10g | grep -Po 'IOPS=[0-9\.]*k'; done > > > Which IO device did you use for the test? NVME? > > That was LVM mirror over 2 network disks. On the target side it was > LVM stripe over few NVMe's. Have you tried to test this over directly connected NVME device to? The networking might naturally improve batching I think. > > > Which system (cpu model/number of cores/etc) did you test on? > > Dual socket: "model name : Intel(R) Xeon(R) Gold 6142 CPU @ > 2.60GHz" with HT enabled, so 64 logical cores in total. The network > was something from Intel with 53 Gbps PHY and served by fm10k driver. All right, thanks! I'll test your driver on my system where I tested most of the current solutions. Best regards, Maxim Levitsky