Re: [RFC PATCH 00/17] virtual-bus

Gregory Haskins <ghaskins@xxxxxxxxxx> · Wed, 01 Apr 2009 07:35:39 -0400

Rusty Russell wrote:
> On Wednesday 01 April 2009 05:12:47 Gregory Haskins wrote:
>   
>> Bare metal: tput = 4078Mb/s, round-trip = 25593pps (39us rtt)
>> Virtio-net: tput = 4003Mb/s, round-trip = 320pps (3125us rtt)
>> Venet: tput = 4050Mb/s, round-trip = 15255 (65us rtt)
>>     
>
> That rtt time is awful.  I know the notification suppression heuristic
> in qemu sucks.
>
> I could dig through the code, but I'll ask directly: what heuristic do
> you use for notification prevention in your venet_tap driver?
>   

I am not 100% sure I know what you mean with "notification prevention",
but let me take a stab at it.

So like most of these kinds of constructs, I have two rings (rx + tx on
the guest is reversed to tx + rx on the host), each of which can signal
in either direction for a total of 4 events, 2 on each side of the
connection.  I utilize what I call "bidirectional napi" so that only the
first packet submitted needs to signal across the guest/host boundary. 
E.g. first ingress packet injects an interrupt, and then does a
napi_schedule and masks future irqs.  Likewise, first egress packet does
a hypercall, and then does a "napi_schedule" (I dont actually use napi
in this path, but its conceptually identical) and masks future
hypercalls.  So thats is my first form of what I would call notification
prevention.

The second form occurs on the "tx-complete" path (that is guest->host
tx).  I only signal back to the guest to reclaim its skbs every 10
packets, or if I drain the queue, whichever comes first (note to self:
make this # configurable).

The nice part about this scheme is it significantly reduces the amount
of guest/host transitions, while still providing the lowest latency
response for single packets possible.  e.g. Send one packet, and you get
one hypercall, and one tx-complete interrupt as soon as it queues on the
hardware.  Send 100 packets, and you get one hypercall and 10
tx-complete interrupts as frequently as every tenth packet queues on the
hardware.  There is no timer governing the flow, etc.

Is that what you were asking?

> As you point out, 350-450 is possible, which is still bad, and it's at least
> partially caused by the exit to userspace and two system calls.  If virtio_net
> had a backend in the kernel, we'd be able to compare numbers properly.
>   
:)

But that is the whole point, isnt it?  I created vbus specifically as a
framework for putting things in the kernel, and that *is* one of the
major reasons it is faster than virtio-net...its not the difference in,
say, IOQs vs virtio-ring (though note I also think some of the
innovations we have added such as bi-dir napi are helping too, but these
are not "in-kernel" specific kinds of features and could probably help
the userspace version too).

I would be entirely happy if you guys accepted the general concept and
framework of vbus, and then worked with me to actually convert what I
have as "venet-tap" into essentially an in-kernel virtio-net.  I am not
specifically interested in creating a competing pv-net driver...I just
needed something to showcase the concepts and I didnt want to hack the
virtio-net infrastructure to do it until I had everyone's blessing. 
Note to maintainers: I *am* perfectly willing to maintain the venet
drivers if, for some reason, we decide that we want to keep them as
is.   Its just an ideal for me to collapse virtio-net and venet-tap
together, and I suspect our community would prefer this as well.

-Greg

Attachment:
signature.asc

Description: OpenPGP digital signature