Re: EXT4 vs LVM performance for VMs

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 13 Feb 2016 13:15:09 +1100

On Fri, Feb 12, 2016 at 06:38:47PM +0100, Premysl Kouril wrote:
> > All of this being said, what are you trying to do?  If you are happy
> > using LVM, feel free to use it.  If there are specific features that
> > you want out of the file system, it's best that you explicitly
> > identify what you want, and so we can minimize the cost of the
> > features of what you want.
> 
> 
> We are trying to decide whether to use filesystem or LVM for VM
> storage. It's not that we are happy with LVM - while it performs
> better there are limitations on LVM side especially when it comes to
> manageability (for example certain features in OpenStack do only fork
> if VM is file-based).
> 
> So, in short, if we would make filesystem to perform better we would
> rather use filesystem than LVM, (and we don't really have any special
> requirements in terms of filesystem features).
> 
> And in order for us to make a good decision I wanted to ask community,
> if our observations and resultant numbers make sense.

For ext4, this is what you are going to get.

How about you try XFS? After all, concurrent direct IO writes is
something it is rather good at.

i.e. use XFS in both your host and guest. Use raw image files on the
host, and to make things roughly even with LVM you'll want to
preallocate them. If you don't want to preallocate them (i.e. sparse
image files) set them up with an extent size hint of at least 1MB so
that it limits fragmentation of the image file.  Then configure qemu
to use cache=none for it's IO to the image file.

On the first write pass to the image file (in either case), you
should see ~70-80% of the native underlying device performance
because there is some overhead in either allocation (sparse image
file) or unwritten extent conversion (preallocated image file).
This, of course, asssumes you are not CPU limited in the QEMU
process by the addition CPU overhead of file block mapping in the
host filesystem vs raw block device IO.

On the second write pass you should see 98-99% of the native
underlying device performance (again with the assumption that CPU
overhead of the host filesystem isn't a limiting factor).

As an example, I have a block device that can sustain just under 36k
random 4k write IOPS on my host. I have an XFS filesystem (default
configs) on that 400GB block device. I created a sparse 500TB image
file using:

# xfs_io -f -c "extsize 1m" -c "truncate 500t" vm-500t.img

And push it into a 16p/16GB RAM guest via:

-drive file=/mnt/fast-ssd/vm-500t.img,if=virtio,cache=none,format=raw

and in the guest run mkfs.xfs with defaults and mount it with
defaults. Then I ran your fio test on that 5 times in a row:

write: io=3072.0MB, bw=106393KB/s, iops=26598, runt= 29567msec
write: io=3072.0MB, bw=141508KB/s, iops=35377, runt= 22230msec
write: io=3072.0MB, bw=141254KB/s, iops=35313, runt= 22270msec
write: io=3072.0MB, bw=141115KB/s, iops=35278, runt= 22292msec
write: io=3072.0MB, bw=141534KB/s, iops=35383, runt= 22226msec

The first run was 26k IOPS, the rest were at 35k IOPS as they
overwrite the same blocks in the image file. IOWs, first pass at 75%
of device capability, the rest at > 98% of the host measured device
capability. All tests reported the full io depth was being used in
the guest:

IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%

The guest OS measured about 30% CPU usage for a single fio run at
35k IOPS:

real    0m22.648s
user    0m1.678s
sys     0m8.175s

However, the QEMU process on the host required 4 entire CPUs to
sustain this IO load, roughly 50/50 user/system time. IOWs, a large
amount of the CPU overhead on such workloads is on the host side in
QEMU, not the guest.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html