Re: [kvm-devel] [PATCH 00/10] PV-IO v3

Gregory Haskins <ghaskins@xxxxxxxxxx> · Tue, 21 Aug 2007 09:11:05 -0400

On Tue, 2007-08-21 at 15:25 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:
> >
> >   
> >> Partly the horror of the code, but mainly because it is an in-order
> >> ring.  You'll note that we use a reply ring, so we don't need to know
> >> how much the other side has consumed (and it needn't do so in order).
> >>
> >>     
> >
> > I have certainly been known to take a similar stance when looking at Xen
> > code ;) (recall the lapic work I did).  However, that said I am not yet
> > convinced that an out-of-order ring (at least as a fundamental
> > primitive) buys us much.  
> 
> It's pretty much required for block I/O into disk arrays.

You are misunderstanding me.  I totally agree that block io is
inherently out-of-order.  What I am trying to convey is that at a
fundamental level *everything* (including block-io) can be viewed as an
ordered sequence of events.

For instance, consider that a block-io driver is making requests like
"perform read transaction X", and "perform write transaction Y".
Likewise, the host side can pass events like "completed transaction Y"
and "completed transaction X".  At this level, everything is *always*
ordered, regardless of the fact that X and Y were temporally rearranged
by the host.

This is what the ioq/pvbus series is trying to address:  These low-level
primitives for moving events in and out of the guest in a VMM agnostic
way.  From there, you could apply higher level constructs such as an
out-of-order sg descriptor ring to represent your block-io data.  The
low-level primitives simply become a way to convey changes to that
construct.

In a nutshell, IOQ provides a simple bi-directional ordered event
channel and a context associated hypercall mechanism (see
pvbus_device->call()) to accomplish these low-level chores.

I am also advocating caution on the tx path, as I think indirection
(e.g. queuing) as opposed to direct access (e.g. contextual hypercall)
has limited applicability.  Trying to come up with a complex
"one-size-fits-all" queue for the tx path may be not worthwhile since in
the end there is still a 1:1 with queue-insert:hypercall.  You might as
well just pass the descriptor directly via the contextual hypercall.
Where this ends up being a win is where you can do the bi-dir NAPI-like
tricks like IOQNET and have the queue-insert to hypercall ratio become >
1.  

> 
> Xen does out-of-order, btw, on its single ring, but at the cost of some 
> complexity.  I don't believe it is worthwhile and prefer split 
> request/reply rings.

I am not against the split rings either.  The article that Rusty
forwarded was very interesting indeed.  But if I understood the article
and Rusty, there are kind of two aspects to it.  A) Using two rings to
make an cache-thrash friendly ordered ring, or B) adding out-of-order
capability to these two rings.  I am certainly in favor of (A) for use
as the low-level event transport.  I just question whether the
complexity of (B) is justified as the one and only queuing mechanism
when there are plenty of patterns that simply cannot take advantage of
it.

What I am wondering is if we should have a set of low-level primitives
that deal primarily with ordered event sequencing and VMM abstraction,
and a higher set of code expressed in terms of these primitives for
implementing the constructs such as (B) for block-io.

> 
> With my VJ T-shirt on, I can even say it's more efficient, as each side 
> of the ring will have a single writer and a single reader, reducing 
> ping-pong effects if the interrupt completions happens to land on the 
> wrong cpu.

Agreed.

> 
> Network tx can be out of order too (with some traffic destined to other 
> guests, some to the host, and some to external interfaces, completions 
> will be out of order).

Well, not with respect to the 1:1 event delivery channel as I envision
it (unless I am misunderstanding you?)

Regards,
-Greg

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/virtualization