On 08/18/2009 11:59 PM, Ira W. Snyder wrote:
On a non shared-memory system (where the guest's RAM is not just a chunk
of userspace RAM in the host system), virtio's management model seems to
fall apart. Feature negotiation doesn't work as one would expect.
In your case, virtio-net on the main board accesses PCI config space
registers to perform the feature negotiation; software on your PCI cards
needs to trap these config space accesses and respond to them according
to virtio ABI.
(There's no real guest on your setup, right? just a kernel running on
and x86 system and other kernels running on the PCI cards?)
This does appear to be solved by vbus, though I haven't written a
vbus-over-PCI implementation, so I cannot be completely sure.
Even if virtio-pci doesn't work out for some reason (though it should),
you can write your own virtio transport and implement its config space
however you like.
I'm not at all clear on how to get feature negotiation to work on a
system like mine. From my study of lguest and kvm (see below) it looks
like userspace will need to be involved, via a miscdevice.
I don't see why. Is the kernel on the PCI cards in full control of all
accesses?
Ok. I thought I should at least express my concerns while we're
discussing this, rather than being too late after finding the time to
study the driver.
Off the top of my head, I would think that transporting userspace
addresses in the ring (for copy_(to|from)_user()) vs. physical addresses
(for DMAEngine) might be a problem. Pinning userspace pages into memory
for DMA is a bit of a pain, though it is possible.
Oh, the ring doesn't transport userspace addresses. It transports guest
addresses, and it's up to vhost to do something with them.
Currently vhost supports two translation modes:
1. virtio address == host virtual address (using copy_to_user)
2. virtio address == offsetted host virtual address (using copy_to_user)
The latter mode is used for kvm guests (with multiple offsets, skipping
some details).
I think you need to add a third mode, virtio address == host physical
address (using dma engine). Once you do that, and wire up the
signalling, things should work.
There is also the problem of different endianness between host and guest
in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h)
defines fields in host byte order. Which totally breaks if the guest has
a different endianness. This is a virtio-net problem though, and is not
transport specific.
Yeah. You'll need to add byteswaps.
I've browsed over both the kvm and lguest code, and it looks like they
each re-invent a mechanism for transporting interrupts between the host
and guest, using eventfd. They both do this by implementing a
miscdevice, which is basically their management interface.
See drivers/lguest/lguest_user.c (see write() and LHREQ_EVENTFD) and
kvm-kmod-devel-88/x86/kvm_main.c (see kvm_vm_ioctl(), called via
kvm_dev_ioctl()) for how they hook up eventfd's.
I can now imagine how two userspace programs (host and guest) could work
together to implement a management interface, including hotplug of
devices, etc. Of course, this would basically reinvent the vbus
management interface into a specific driver.
You don't need anything in the guest userspace (virtio-net) side.
I think this is partly what Greg is trying to abstract out into generic
code. I haven't studied the actual data transport mechanisms in vbus,
though I have studied virtio's transport mechanism. I think a generic
management interface for virtio might be a good thing to consider,
because it seems there are at least two implementations already: kvm and
lguest.
Management code in the kernel doesn't really help unless you plan to
manage things with echo and cat.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html