Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > the rings themselves would be shared A would say "copy this to/from B's
> > ring entry N" and you know that A can't have changed B's entry.
> 
> Sorry, I don't follow.  How can the rings be shared?  If A puts a gpa in
> A's address space into the ring, there's no way B can do anything with
> it, it's an opaque number.  Xen solves this with an extra layer of
> indirection (grant table handles) that cost extra hypercalls to map or
> copy.

It's not symmetric.  B can see the desc and avail pages R/O, and the
used page R/W.  It needs to ask the something to copy in/out of
descriptors, though, because they're an opaque number, and it doesn't
have access.  ie. the existence of the descriptor in the ring *implies*
a grant.

Perhaps this could be generalized further into a "connect these two
rings", but I'm not sure.  Descriptors with both read and write parts
are tricky.

> > Every driver really wants to put a pointer in there.  We have an array
> > to map desc. numbers to cookies inside the virtio core.
> >
> > We really want 64 bits.
> 
> With multiqueue, it may be cheaper to do the extra translation locally
> than to ship the cookie across cores (or, more likely, it will make no
> difference).

Indeed.

> However, moving pointers only works if you trust the other side.  That
> doesn't work if we manage to share a ring.

Yes, that part needs to be trusted too.

> > I'm just not sure how the host would even know to hint.
> 
> For JBOD storage, a good rule of thumb is (number of spindles) x 3. 
> With less, you might leave an idle spindle; with more, you're just
> adding latency.  This assumes you're using indirects so ring entry ==
> request.  The picture is muddier with massive battery-backed RAID
> controllers or flash.
> 
> For networking, you want (ring size) * min(expected packet size, page
> size) / (link bandwidth) to be something that doesn't get the
> bufferbloat people after your blood.

OK, so while neither side knows, the host knows slightly more.

Now I think about it, from a spec POV, saying it's a "hint" is useless,
as it doesn't tell the driver what to do with it.  I'll say it's a
maximum, which keeps it simple.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux