On Tue, Nov 6, 2018 at 10:40 AM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > Previously vhost_blk.ko implementations were basically the same thing as > the QEMU x-data-plane=on (dedicated thread using Linux AIO), except they > were using a kernel thread and maybe submitted bios. > > The performance differences weren't convincing enough that it seemed > worthwhile maintaining another code path which loses live migration, I/O > throttling, image file formats, etc (all the things that QEMU's block > layer supports). > > Two changes since then: > > 1. x-data-plane=on has been replaced with a full trip down QEMU's block > layer (-object iothread,id=iothread0 -device > virtio-blk-pci,iothread=iothread0,...). It's slower and not truly > multiqueue (yet!). > > So from this perspective vhost_blk.ko might be more attractive again, at > least until further QEMU block layer work eliminates the multiqueue and > performance overheads. Yes, this work is a direct consequence of insufficient performance of virtio-blk's host side. I'm working on a storage driver, but there's no a good way to feed all these IOs into one disk of one VM. The nature of storage design dictates the need of very high IOPS seen by VM. This is only one tiny use case of course, but the vhost/QEMU change is small enough to share. > 2. SPDK has become available for users who want the best I/O performance > and are willing to sacrifice CPU cores for polling. > > If you want better performance and don't care about QEMU block layer > features, could you use SPDK? People who are the target market for > vhost_blk.ko would probably be willing to use SPDK and it already > exists... Yes. Though in my experience SPDK creates more problems most of times than it solves ;) What I find very compelling in using a plain Linux block device is that it is really fast these days (blk-mq) and the device mapper can be used for even greater flexibility. Device mapper is less than perfect performance-wise and at some point will need some work for sure, but still can push few million IOPS through. And it's all standard code with decades old user APIs. In fact, Linux kernel is so good now that our pure-software solution can push IO at rates up to the limits of fat hardware (x00 GbE, bunch of NVMes) without an apparent need for hardware acceleration. And, without hardware dependencies, it is much more flexible. Disk interface between the host and VM was the only major bottleneck. > From the QEMU userspace perspective, I think the best way to integrate > vhost_blk.ko is to transparently switch to it when possible. If the > user enables QEMU block layer features that are incompatible with > vhost_blk.ko, then it should fall back to the QEMU block layer > transparently. Sounds like an excellent idea! I'll do that. Most of vhost-blk support in QEMU is a boilerplate code anyways. > I'm not keen on yet another code path with it's own set of limitations > and having to educate users about how to make the choice. But if it can > be integrated transparently as an "accelerator", then it could be > valuable. Understood. Agree. Thanks! -- wbr, Vitaly