Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > the rings themselves would be shared A would say "copy this to/from B's
> > ring entry N" and you know that A can't have changed B's entry.
> 
> Sorry, I don't follow.  How can the rings be shared?  If A puts a gpa in
> A's address space into the ring, there's no way B can do anything with
> it, it's an opaque number.  Xen solves this with an extra layer of
> indirection (grant table handles) that cost extra hypercalls to map or
> copy.

It's not symmetric.  B can see the desc and avail pages R/O, and the
used page R/W.  It needs to ask the something to copy in/out of
descriptors, though, because they're an opaque number, and it doesn't
have access.  ie. the existence of the descriptor in the ring *implies*
a grant.

Perhaps this could be generalized further into a "connect these two
rings", but I'm not sure.  Descriptors with both read and write parts
are tricky.

> > Every driver really wants to put a pointer in there.  We have an array
> > to map desc. numbers to cookies inside the virtio core.
> >
> > We really want 64 bits.
> 
> With multiqueue, it may be cheaper to do the extra translation locally
> than to ship the cookie across cores (or, more likely, it will make no
> difference).

Indeed.

> However, moving pointers only works if you trust the other side.  That
> doesn't work if we manage to share a ring.

Yes, that part needs to be trusted too.

> > I'm just not sure how the host would even know to hint.
> 
> For JBOD storage, a good rule of thumb is (number of spindles) x 3. 
> With less, you might leave an idle spindle; with more, you're just
> adding latency.  This assumes you're using indirects so ring entry ==
> request.  The picture is muddier with massive battery-backed RAID
> controllers or flash.
> 
> For networking, you want (ring size) * min(expected packet size, page
> size) / (link bandwidth) to be something that doesn't get the
> bufferbloat people after your blood.

OK, so while neither side knows, the host knows slightly more.

Now I think about it, from a spec POV, saying it's a "hint" is useless,
as it doesn't tell the driver what to do with it.  I'll say it's a
maximum, which keeps it simple.

Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux