Re: [RFC] virtio: Support releasing lock during kick

Stefan Hajnoczi <stefanha@xxxxxxxxxxxxxxxxxx> · Fri, 25 Jun 2010 17:05:51 +0100

On Fri, Jun 25, 2010 at 06:32:20PM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 25, 2010 at 04:31:44PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote:
> > > > On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote:
> > > > > On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote:
> > > > > > Shouldn't it be possible to just drop the lock before invoking
> > > > > > virtqueue_kick() and reacquire it afterwards?  There's nothing in that
> > > > > > virtqueue_kick() path that the lock is protecting AFAICT.
> > > > > 
> > > > > No, that would lead to a race condition because vq->num_added is
> > > > > modified by both virtqueue_add_buf_gfp() and virtqueue_kick().
> > > > > Without a lock held during virtqueue_kick() another vcpu could add
> > > > > bufs while vq->num_added is used and cleared by virtqueue_kick():
> > > > 
> > > > Right, this dovetails with another proposed change (was it Michael?)
> > > > where we would update the avail idx inside add_buf, rather than waiting
> > > > until kick.  This means a barrier inside add_buf, but that's probably
> > > > fine.
> > > > 
> > > > If we do that, then we don't need a lock on virtqueue_kick.
> > > > 
> > > > Michael, thoughts?
> > > 
> > > Maybe not even that: I think we could just do virtio_wmb()
> > > in add, and keep the mb() in kick.
> > > 
> > > What I'm a bit worried about is contention on the cacheline
> > > including index and flags: the more we write to that line,
> > > the worse it gets.
> > > 
> > > So need to test performance impact of this change:
> > > I didn't find time to do this yet, as I am trying
> > > to finalize the used index publishing patches.
> > > Any takers?
> > > 
> > > Do we see performance improvement after making kick lockless?
> > 
> > There was no guest CPU reduction or I/O throughput increase with my
> > patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null
> > processes.  However, the lock_stat numbers above show clear improvement
> > of the lock hold/wait times.
> > 
> > I was hoping to see guest CPU utilization go down and I/O throughput go
> > up, so there is still investigation to do with my patch in isolation.
> > Although I'd like to try it later, putting my patch on top of your avail
> > idx work is too early because it will be harder to reason about the
> > performance with both patches present at the same time.
> > 
> > Stefan
> 
> What about host CPU utilization?

There is data available for host CPU utilization, I need to dig it up.

> Also, are you using PARAVIRT_SPINLOCKS?

No.  I haven't found much documentation on paravirt spinlocks other than
the commit that introduced them:

  commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
  Author: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
  Date:   Mon Jul 7 12:07:51 2008 -0700

      paravirt: introduce a "lock-byte" spinlock implementation

PARAVIRT_SPINLOCKS is not set in the config I use, probably because of
the associated performance issue that causes distros to build without
them:

  commit b4ecc126991b30fe5f9a59dfacda046aeac124b2
  Author: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
  Date:   Wed May 13 17:16:55 2009 -0700

      x86: Fix performance regression caused by paravirt_ops on native
      kernels

I would expect performance results to be smoother with
PARAVIRT_SPINLOCKS for the guest kernel.  I will add it for future runs,
thanks for pointing it out.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html