Re: [RFC] vhost-blk implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Inspired by vhost-net implementation, I did initial prototype 
> of vhost-blk to see if it provides any benefits over QEMU virtio-blk.
> I haven't handled all the error cases, fixed naming conventions etc.,
> but the implementation is stable to play with. I tried not to deviate
> from vhost-net implementation where possible.

Can you also send the qemu side of it?

> with vhost-blk:
> ----------------
> 
> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
> 640000+0 records in
> 640000+0 records out
> 83886080000 bytes (84 GB) copied, 126.135 seconds, 665 MB/s
> 
> real    2m6.137s
> user    0m0.281s
> sys     0m14.725s
> 
> without vhost-blk: (virtio)
> ---------------------------
> 
> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
> 640000+0 records in
> 640000+0 records out
> 83886080000 bytes (84 GB) copied, 275.466 seconds, 305 MB/s
> 
> real    4m35.468s
> user    0m0.373s
> sys     0m48.074s

Which caching mode is this?  I assume data=writeback, because otherwise
you'd be doing synchronous I/O directly from the handler.

> +static int do_handle_io(struct file *file, uint32_t type, uint64_t sector,
> +			struct iovec *iov, int in)
> +{
> +	loff_t pos = sector << 8;
> +	int ret = 0;
> +
> +	if (type & VIRTIO_BLK_T_FLUSH)  {
> +		ret = vfs_fsync(file, file->f_path.dentry, 1);
> +	} else if (type & VIRTIO_BLK_T_OUT) {
> +		ret = vfs_writev(file, iov, in, &pos);
> +	} else {
> +		ret = vfs_readv(file, iov, in, &pos);
> +	}
> +	return ret;

I have to admit I don't understand the vhost architecture at all, but
where do the actual data pointers used by the iovecs reside?
vfs_readv/writev expect both the iovec itself and the buffers
pointed to by it to reside in userspace, so just using kernel buffers
here will break badly on architectures with different user/kernel
mappings.  A lot of this is fixable using simple set_fs & co tricks,
but for direct I/O which uses get_user_pages even that will fail badly.

Also it seems like you're doing all the I/O synchronous here?  For
data=writeback operations that could explain the read speedup
as you're avoiding context switches, but for actual write I/O
which has to get data to disk (either directly from vfs_writev or
later through vfs_fsync) this seems like a really bad idea stealing
a lot of guest time that should happen in the background.


Other than that the code seems quite nice and simple, but one huge
problem is that it'll only support raw images, and thus misses out
on all the "nice" image formats used in qemu deployments, especially
qcow2.  It's also missing the ioctl magic we're having in various
places, both for controlling host devices like cdroms and SG
passthrough.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux