On 12/06/2011 02:03 PM, Rusty Russell wrote: > On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@xxxxxxxxxx> wrote: > > On 12/06/2011 07:07 AM, Rusty Russell wrote: > > > Yes, but the hypervisor/trusted party would simply have to do the copy; > > > the rings themselves would be shared A would say "copy this to/from B's > > > ring entry N" and you know that A can't have changed B's entry. > > > > Sorry, I don't follow. How can the rings be shared? If A puts a gpa in > > A's address space into the ring, there's no way B can do anything with > > it, it's an opaque number. Xen solves this with an extra layer of > > indirection (grant table handles) that cost extra hypercalls to map or > > copy. > > It's not symmetric. B can see the desc and avail pages R/O, and the > used page R/W. It needs to ask the something to copy in/out of > descriptors, though, because they're an opaque number, and it doesn't > have access. ie. the existence of the descriptor in the ring *implies* > a grant. > > Perhaps this could be generalized further into a "connect these two > rings", but I'm not sure. Descriptors with both read and write parts > are tricky. Okay, I was using a wrong mental model of how this works. B must be aware of the translation from A's address space into B. Both qemu and the kernel can do this on their own, but if B is another guest, then it cannot do this except by calling H. vhost-copy cannot work fully transparently, because you need some memory to copy into. Maybe we can have a pci device with a large BAR that contains buffers for copying, and also a translation from A addresses into B addresses. It would work something like this: A prepares a request with both out and in buffers vhost-copy allocates memory in B's virtio-copy BAR, copies (using a DMA engine) the out buffers into it, and rewrites the out descriptors to contain B addresses B services the request, and updates the in addresses in the descriptors to point at B memory vhost-copy copies (using a DMA engine) the in buffers into A memory > > > I'm just not sure how the host would even know to hint. > > > > For JBOD storage, a good rule of thumb is (number of spindles) x 3. > > With less, you might leave an idle spindle; with more, you're just > > adding latency. This assumes you're using indirects so ring entry == > > request. The picture is muddier with massive battery-backed RAID > > controllers or flash. > > > > For networking, you want (ring size) * min(expected packet size, page > > size) / (link bandwidth) to be something that doesn't get the > > bufferbloat people after your blood. > > OK, so while neither side knows, the host knows slightly more. > > Now I think about it, from a spec POV, saying it's a "hint" is useless, > as it doesn't tell the driver what to do with it. I'll say it's a > maximum, which keeps it simple. > Those rules of thumb always have exceptions, I'd say it's the default that the guest can override. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html