Hi Anthony and Avi,
Anthony Liguori wrote:
Avi Kivity wrote:
Anthony Liguori wrote:
Hi Cam,
I would suggest two design changes to make here. The first is that I
think you should use virtio.
I disagree with this. While virtio is excellent at exporting guest
memory, it isn't so good at importing another guest's memory.
First we need to separate static memory sharing and dynamic memory
sharing. Static memory sharing has to be configured on start up. I
think in practice, static memory sharing is not terribly interesting
except for maybe embedded environments.
I think there is value for static memory sharing. It can be used for
fast, simple synchronization and communication between guests (and the
host) that use need to share data that needs to be updated frequently
(such as a simple cache or notification system). It may not be a common
task, but I think static sharing has its place and that's what this
device is for at this point.
Dynamically memory sharing requires bidirectional communication in order
to establish mappings and tear down mappings. You'll eventually
recreate virtio once you've implemented this communication mechanism.
The second is that I think instead of relying on mapping in device
memory to the guest, you should have the guest allocate it's own
memory to dedicate to sharing.
That's not what you describe below. You're having the guest allocate
parts of its address space that happen to be used by RAM, and
overlaying those parts with the shared memory.
But from the guest's perspective, it's RAM is being used for memory
sharing.
If you're clever, you could start a guest with -mem-path and then use
this mechanism to map a portion of one guest's memory into another guest
without either guest ever knowing who "owns" the memory and with exactly
the same driver on both.
Right now, you've got a bit of a hole in your implementation because
you only support files that are powers-of-two in size even though
that's not documented/enforced. This is a limitation of PCI resource
regions.
While the BAR needs to be a power of two, I don't think the RAM
backing it needs to be.
Then you need a side channel to communicate the information to the guest.
Couldn't one of the registers in BAR0 be used to store the actual
(non-power-of-two) size?
Also, the PCI memory hole is limited in size today which is going to
put an upper bound on the amount of memory you could ever map into a
guest.
Today. We could easily lift this restriction by supporting 64-bit
BARs. It would probably take only a few lines of code.
Since you're using qemu_ram_alloc() also, it makes hotplug unworkable
too since qemu_ram_alloc() is a static allocation from a contiguous
heap.
We need to fix this anyway, for memory hotplug.
It's going to be hard to "fix" with TCG.
If you used virtio, what you could do is provide a ring queue that
was used to communicate a series of requests/response. The exchange
might look like this:
guest: REQ discover memory region
host: RSP memory region id: 4 size: 8k
guest: REQ map region id: 4 size: 8k: sgl: {(addr=43000, size=4k),
(addr=944000,size=4k)}
host: RSP mapped region id: 4
guest: REQ notify region id: 4
host: RSP notify region id: 4
guest: REQ poll region id: 4
host: RSP poll region id: 4
That looks significantly more complex.
It's also supporting dynamic shared memory. If you do use BARs, then
perhaps you'd just do PCI hotplug to make things dynamic.
And the REQ/RSP order does not have to be in series like this. In
general, you need one entry on the queue to poll for new memory
regions, one entry for each mapped region to poll for incoming
notification, and then the remaining entries can be used to send
short-lived requests/responses.
It's important that the REQ map takes a scatter/gather list of
physical addresses because after running for a while, it's unlikely
that you'll be able to allocate any significant size of contiguous
memory.
From a QEMU perspective, you would do memory sharing by waiting for a
map REQ from the guest and then you would complete the request by
doing an mmap(MAP_FIXED) with the appropriate parameters into
phys_ram_base.
That will fragment the vma list. And what do you do when you unmap
the region?
How does a 256M guest map 1G of shared memory?
It doesn't but it couldn't today either b/c of the 32-bit BARs.
Cam
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html