Re: [PATCH] vhost-blk: Add vhost-blk support v5

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Tue, 20 Nov 2012 15:37:15 +0200

On Tue, Nov 20, 2012 at 02:39:40PM +0800, Asias He wrote:
> On 11/20/2012 04:26 AM, Michael S. Tsirkin wrote:
> > On Mon, Nov 19, 2012 at 04:53:42PM +0800, Asias He wrote:
> >> vhost-blk is an in-kernel virito-blk device accelerator.
> >>
> >> Due to lack of proper in-kernel AIO interface, this version converts
> >> guest's I/O request to bio and use submit_bio() to submit I/O directly.
> >> So this version any supports raw block device as guest's disk image,
> >> e.g. /dev/sda, /dev/ram0. We can add file based image support to
> >> vhost-blk once we have in-kernel AIO interface. There are some work in
> >> progress for in-kernel AIO interface from Dave Kleikamp and Zach Brown:
> >>
> >>    http://marc.info/?l=linux-fsdevel&m=133312234313122
> >>
> >> Performance evaluation:
> >> -----------------------------
> >> 1) LKVM
> >> Fio with libaio ioengine on Fusion IO device using kvm tool
> >> IOPS(k)    Before       After   Improvement
> >> seq-read   107          121     +13.0%
> >> seq-write  130          179     +37.6%
> >> rnd-read   102          122     +19.6%
> >> rnd-write  125          159     +27.0%
> >>
> >> 2) QEMU
> >> Fio with libaio ioengine on Fusion IO device using QEMU
> >> IOPS(k)    Before       After   Improvement
> >> seq-read   76           123     +61.8%
> >> seq-write  139          173     +24.4%
> >> rnd-read   73           120     +64.3%
> >> rnd-write  75           156     +108.0%
> > 
> > Could you compare with dataplane qemu as well please?
> 
> 
> Well, I will try to collect it.
> 
> > 
> >>
> >> Userspace bits:
> >> -----------------------------
> >> 1) LKVM
> >> The latest vhost-blk userspace bits for kvm tool can be found here:
> >> git@xxxxxxxxxx:asias/linux-kvm.git blk.vhost-blk
> >>
> >> 2) QEMU
> >> The latest vhost-blk userspace prototype for QEMU can be found here:
> >> git@xxxxxxxxxx:asias/qemu.git blk.vhost-blk
> >>
> >> Changes in v5:
> >> - Do not assume the buffer layout
> >> - Fix wakeup race
> >>
> >> Changes in v4:
> >> - Mark req->status as userspace pointer
> >> - Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status()
> >> - Add if (need_resched()) schedule() in blk thread
> >> - Kill vhost_blk_stop_vq() and move it into vhost_blk_stop()
> >> - Use vq_err() instead of pr_warn()
> >> - Fail un Unsupported request
> >> - Add flush in vhost_blk_set_features()
> >>
> >> Changes in v3:
> >> - Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph!
> >> - Check file passed by user is a raw block device file
> >>
> >> Signed-off-by: Asias He <asias@xxxxxxxxxx>
> > 
> > Since there are files shared by this and vhost net
> > it's easiest for me to merge this all through the
> > vhost tree.
> > 
> > Jens, could you ack this and the bio usage in this driver
> > please?
> > 
> >> ---
> >>  drivers/vhost/Kconfig     |   1 +
> >>  drivers/vhost/Kconfig.blk |  10 +
> >>  drivers/vhost/Makefile    |   2 +
> >>  drivers/vhost/blk.c       | 697 ++++++++++++++++++++++++++++++++++++++++++++++
> >>  drivers/vhost/blk.h       |   8 +
> >>  5 files changed, 718 insertions(+)
> >>  create mode 100644 drivers/vhost/Kconfig.blk
> >>  create mode 100644 drivers/vhost/blk.c
> >>  create mode 100644 drivers/vhost/blk.h
> >>
> >> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> >> index 202bba6..acd8038 100644
> >> --- a/drivers/vhost/Kconfig
> >> +++ b/drivers/vhost/Kconfig
> >> @@ -11,4 +11,5 @@ config VHOST_NET
> >>  
> >>  if STAGING
> >>  source "drivers/vhost/Kconfig.tcm"
> >> +source "drivers/vhost/Kconfig.blk"
> >>  endif
> >> diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk
> >> new file mode 100644
> >> index 0000000..ff8ab76
> >> --- /dev/null
> >> +++ b/drivers/vhost/Kconfig.blk
> >> @@ -0,0 +1,10 @@
> >> +config VHOST_BLK
> >> +	tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
> >> +	depends on BLOCK &&  EXPERIMENTAL && m
> >> +	---help---
> >> +	  This kernel module can be loaded in host kernel to accelerate
> >> +	  guest block with virtio_blk. Not to be confused with virtio_blk
> >> +	  module itself which needs to be loaded in guest kernel.
> >> +
> >> +	  To compile this driver as a module, choose M here: the module will
> >> +	  be called vhost_blk.
> >> diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> >> index a27b053..1a8a4a5 100644
> >> --- a/drivers/vhost/Makefile
> >> +++ b/drivers/vhost/Makefile
> >> @@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
> >>  vhost_net-y := vhost.o net.o
> >>  
> >>  obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
> >> +obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
> >> +vhost_blk-y := blk.o
> >> diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
> >> new file mode 100644
> >> index 0000000..f0f118a
> >> --- /dev/null
> >> +++ b/drivers/vhost/blk.c
> >> @@ -0,0 +1,697 @@
> >> +/*
> >> + * Copyright (C) 2011 Taobao, Inc.
> >> + * Author: Liu Yuan <tailai.ly@xxxxxxxxxx>
> >> + *
> >> + * Copyright (C) 2012 Red Hat, Inc.
> >> + * Author: Asias He <asias@xxxxxxxxxx>
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2.
> >> + *
> >> + * virtio-blk server in host kernel.
> >> + */
> >> +
> >> +#include <linux/miscdevice.h>
> >> +#include <linux/module.h>
> >> +#include <linux/vhost.h>
> >> +#include <linux/virtio_blk.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/file.h>
> >> +#include <linux/kthread.h>
> >> +#include <linux/blkdev.h>
> >> +#include <linux/llist.h>
> >> +
> >> +#include "vhost.c"
> >> +#include "vhost.h"
> >> +#include "blk.h"
> >> +
> >> +static DEFINE_IDA(vhost_blk_index_ida);
> >> +
> >> +enum {
> >> +	VHOST_BLK_VQ_REQ = 0,
> >> +	VHOST_BLK_VQ_MAX = 1,
> >> +};
> >> +
> >> +struct req_page_list {
> >> +	struct page **pages;
> >> +	int pages_nr;
> >> +};
> >> +
> >> +struct vhost_blk_req {
> >> +	struct llist_node llnode;
> >> +	struct req_page_list *pl;
> >> +	struct vhost_blk *blk;
> >> +
> >> +	struct iovec *iov;
> >> +	int iov_nr;
> >> +
> >> +	struct bio **bio;
> >> +	atomic_t bio_nr;
> >> +
> >> +	struct iovec status[1];
> >> +
> >> +	sector_t sector;
> >> +	int write;
> >> +	u16 head;
> >> +	long len;
> >> +};
> >> +
> >> +struct vhost_blk {
> >> +	struct task_struct *host_kick;
> > 
> > Could you please add a comment here explaining why
> > is this better than using the builtin vhost thread?
> 
> With separate thread model, the request submit and request completion
> can be handled in parallel. I have measured significant performance
> improvement with this model.
> In the long term, It would be nice if the
> vhost core can provide multiple thread support.

Strange, all completion does it write out the used ring.
I am guessing you didn't complete requests
in a timely manner which was what caused the issue.
OTOH I'm not sure how increasing the number of threads
scales with # of VMs. Increased scheduler pressure is
also a concern.
Did you try checking completion after each
submitted request to address this?

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization