Re: [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk

Liu Yuan <namei.unix@xxxxxxxxx> · Mon, 01 Aug 2011 16:55:54 +0800

On 08/01/2011 04:12 PM, Michael S. Tsirkin wrote:
On Mon, Aug 01, 2011 at 02:25:36PM +0800, Liu Yuan wrote:
On 07/28/2011 11:22 PM, Michael S. Tsirkin wrote:
It would be nicer to reuse the worker infrastructure
>from vhost.c. In particular this one ignores cgroups that
the owner belongs to if any.
Does this one do anything that vhost.c doesn't?

The main idea I use a separated thread to handling completion is to
decouple the  request handling
and the request completion signalling. This might allow better
scalability in a IO intensive scenario,
The code seems to have the vq mutex though, isn't that right?
If so, it can't execute in parallel so it's a bit
hard to see how this would help scalability.


Nope, V1 completion thread doesn't has any mutex, thus can run parallel 
with the vhost worker.
Anyway, I'll adopt completion code to the vhost worker, since it deals 
with other stuff like cgroup.

I had a feeling other devices might want some other structure
(not an fd) as a backend. Maybe that was wrong ...
If so, pls do that, and #define VHOST_NET_SET_BACKEND VHOST_SET_BACKEND
for compatibility.


Okay.

  static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
  			 struct iocb *iocb, bool compat)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index d9a5917..6343bc9 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -406,6 +406,7 @@ struct file *eventfd_file_create(unsigned int count, int flags)

  	return file;
  }
+EXPORT_SYMBOL_GPL(eventfd_file_create);
You can avoid the need for this export if you pass
the eventfd in from userspace.

Since eventfd used by completion code is internal and hiding it from
hw/vhost_blk.c would simplify
the configuration, I think this exporting is necessary and can get
rid of unnecessary FD management
in vhost-blk.c.
Well this is a new kernel interface duplicating the functionality of the
old one.  You'll have a hard time selling this idea upstream, I suspect.
And I doubt it simplifies the code significantly.
Further, you have a single vq for block, but net has two and
we do want the flexibility of using a single eventfd for both,
I think.

point taken.

  SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
  {
diff --git a/include/linux/aio.h b/include/linux/aio.h
index 7a8db41..d63bc04 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -214,6 +214,37 @@ struct mm_struct;
  extern void exit_aio(struct mm_struct *mm);
  extern long do_io_submit(aio_context_t ctx_id, long nr,
  			 struct iocb __user *__user *iocbpp, bool compat);
+extern void __put_ioctx(struct kioctx *ctx);
+extern struct kioctx *ioctx_alloc(unsigned nr_events);
+extern struct kiocb *aio_get_req(struct kioctx *ctx);
+extern ssize_t aio_run_iocb(struct kiocb *iocb);
+extern int __aio_run_iocbs(struct kioctx *ctx);
+extern int read_events(struct kioctx *ctx,
+                        long min_nr, long nr,
+                        struct io_event __user *event,
+                        struct timespec __user *timeout);
+extern void io_destroy(struct kioctx *ioctx);
+extern ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat);
+extern void __put_ioctx(struct kioctx *ctx);
+
+static inline void get_ioctx(struct kioctx *kioctx)
+{
+        BUG_ON(atomic_read(&kioctx->users)<= 0);
+        atomic_inc(&kioctx->users);
+}
+
+static inline int try_get_ioctx(struct kioctx *kioctx)
+{
+        return atomic_inc_not_zero(&kioctx->users);
+}
+
+static inline void put_ioctx(struct kioctx *kioctx)
+{
+        BUG_ON(atomic_read(&kioctx->users)<= 0);
+        if (unlikely(atomic_dec_and_test(&kioctx->users)))
+                __put_ioctx(kioctx);
+}
+
  #else
  static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
  static inline int aio_put_req(struct kiocb *iocb) { return 0; }
--
1.7.5.1
Other comments will be addressed in V2. Thanks

Yuan

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html