On Fri, Feb 12, 2016 at 06:38:47PM +0100, Premysl Kouril wrote: > > All of this being said, what are you trying to do? If you are happy > > using LVM, feel free to use it. If there are specific features that > > you want out of the file system, it's best that you explicitly > > identify what you want, and so we can minimize the cost of the > > features of what you want. > > > We are trying to decide whether to use filesystem or LVM for VM > storage. It's not that we are happy with LVM - while it performs > better there are limitations on LVM side especially when it comes to > manageability (for example certain features in OpenStack do only fork > if VM is file-based). > > So, in short, if we would make filesystem to perform better we would > rather use filesystem than LVM, (and we don't really have any special > requirements in terms of filesystem features). > > And in order for us to make a good decision I wanted to ask community, > if our observations and resultant numbers make sense. For ext4, this is what you are going to get. How about you try XFS? After all, concurrent direct IO writes is something it is rather good at. i.e. use XFS in both your host and guest. Use raw image files on the host, and to make things roughly even with LVM you'll want to preallocate them. If you don't want to preallocate them (i.e. sparse image files) set them up with an extent size hint of at least 1MB so that it limits fragmentation of the image file. Then configure qemu to use cache=none for it's IO to the image file. On the first write pass to the image file (in either case), you should see ~70-80% of the native underlying device performance because there is some overhead in either allocation (sparse image file) or unwritten extent conversion (preallocated image file). This, of course, asssumes you are not CPU limited in the QEMU process by the addition CPU overhead of file block mapping in the host filesystem vs raw block device IO. On the second write pass you should see 98-99% of the native underlying device performance (again with the assumption that CPU overhead of the host filesystem isn't a limiting factor). As an example, I have a block device that can sustain just under 36k random 4k write IOPS on my host. I have an XFS filesystem (default configs) on that 400GB block device. I created a sparse 500TB image file using: # xfs_io -f -c "extsize 1m" -c "truncate 500t" vm-500t.img And push it into a 16p/16GB RAM guest via: -drive file=/mnt/fast-ssd/vm-500t.img,if=virtio,cache=none,format=raw and in the guest run mkfs.xfs with defaults and mount it with defaults. Then I ran your fio test on that 5 times in a row: write: io=3072.0MB, bw=106393KB/s, iops=26598, runt= 29567msec write: io=3072.0MB, bw=141508KB/s, iops=35377, runt= 22230msec write: io=3072.0MB, bw=141254KB/s, iops=35313, runt= 22270msec write: io=3072.0MB, bw=141115KB/s, iops=35278, runt= 22292msec write: io=3072.0MB, bw=141534KB/s, iops=35383, runt= 22226msec The first run was 26k IOPS, the rest were at 35k IOPS as they overwrite the same blocks in the image file. IOWs, first pass at 75% of device capability, the rest at > 98% of the host measured device capability. All tests reported the full io depth was being used in the guest: IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% The guest OS measured about 30% CPU usage for a single fio run at 35k IOPS: real 0m22.648s user 0m1.678s sys 0m8.175s However, the QEMU process on the host required 4 entire CPUs to sustain this IO load, roughly 50/50 user/system time. IOWs, a large amount of the CPU overhead on such workloads is on the host side in QEMU, not the guest. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html