Re: KVM / Use of NVMe Drives & Throughput

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/01/2018 19:21, Brian Spraker -- BsnTech wrote:
> Hello all,
> 
> Just recently purchased a Samsung 960 Evo NVMe drive.  It is on a PCI
> Express add-in card  in a PCIe 2.0 x4 slot since the motherboard itself
> does not have an M.2 slot.
> 
> On the main host machine, I ran "hdparm -Tt --direct /dev/nvme0n1" and
> get the following:
> 
> Timing O_DIRECT cached reads:   2578 MB in  2.00 seconds = 1289.17 MB/sec
> Timing O_DIRECT disk reads: 3962 MB in  3.00 seconds = 1320.20 MB/sec
> 
> The drive has one Ext4 partition on it.  Mounted with "noatime" to the
> /VMs mount point.
> 
> Host server is running Ubuntu server 16.04 and running "kvm --version"
> shows QEMU emulator version 2.5.0.
> 
> When I run the same "hdparm -Tt --direct /dev/vda" on the hosts, it
> shows quite a bit less:
> 
> Timing O_DIRECT cached reads:   1456 MB in  2.00 seconds = 727.51 MB/sec
> Timing O_DIRECT disk reads: 2730 MB in  3.00 seconds = 909.76 MB/sec
> 
> Guest machines are setup with 10 GB of memory, 8 CPU (host CPU config
> copied), virtio for disk bus, raw image file, with no cache. Guest
> machines are also Ubuntu 16.04.
> 
> Before the upgrade, I was using an SSD in SATA III mode.  The host and
> guest disk reads were only a few MB/sec apart.
> 
> Is there something else I need to look at to see why the guests are only
> getting about half the throughput of the drive?

It's hard to say this conclusively without knowing the exact I/O pattern
that hdparm is using or the results for your SATA SSD.

However, I'd guess your new disk is faster and you're now CPU bound.
Unfortunately, the cost of an interrupt is roughly doubled by
virtualization (because you have to go disk->host->QEMU->guest) and so
is the latency.

If hdparm is only issuing 1 I/O operation at a time, throughput is the
reciprocal of the latency: double the latency, and the throughput is halved.

Try using aio=native and adding a dedicated iothread for the disk.  That
can give you better throughput, especially if the queue depth (# of I/O
operations active at any one time during the benchmark) is >1.

Thanks,

Paolo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux