Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

Cam Macdonell <cam@xxxxxxxxxxxxxx> · Tue, 11 May 2010 08:17:08 -0600

On Tue, May 11, 2010 at 8:03 AM, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 05/11/2010 04:10 PM, Anthony Liguori wrote:
>>
>> On 05/11/2010 02:59 AM, Avi Kivity wrote:
>>>>
>>>> (Replying again to list)
>>>>
>>>> What data structure would you use?  For a lockless ring queue, you can
>>>> only support a single producer and consumer.  To achieve bidirectional
>>>> communication in virtio, we always use two queues.
>>>
>>>
>>> You don't have to use a lockless ring queue.  You can use locks
>>> (spinlocks without interrupt support, full mutexes with interrupts) and any
>>> data structure you like.  Say a hash table + LRU for a shared cache.
>>
>> Yeah, the mailslot enables this.
>>
>> I think the question boils down to whether we can support transparent peer
>> connections and disconnections.  I think that's important in order to
>> support transparent live migration.
>>
>> If you have two peers that are disconnected and then connect to each
>> other, there's simply no way to choose who's content gets preserved.  It's
>> necessary to designate one peer as a master in order to break the tie.
>
> The master is the shared memory area.  It's a completely separate entity
> that is represented by the backing file (or shared memory server handing out
> the fd to mmap).  It can exists independently of any guest.

I think the master/peer idea would be necessary if we were sharing
guest memory (sharing guest A's memory with guest B).  Then if the
master (guest A) dies, perhaps something needs to happen to preserve
the memory contents.  But since we're sharing host memory, the
applications in the guests can race to determine the master by
grabbing a lock at offset 0 or by using lowest VM ID.

Looking at it another way, it is the applications using shared memory
that may or may not need a master, the Qemu processes don't need the
concept of a master since the memory belongs to the host.

>
>>
>> So this could simply involve an additional option to the shared memory
>> driver: role=master|peer.  If role=master, when a new shared memory segment
>> is mapped, the contents of the BAR ram is memcpy()'d to the shared memory
>> segment.  In either case, the contents of the shared memory segment should
>> be memcpy()'d to the BAR ram whenever the shared memory segment is
>> disconnected.
>
> I don't understand why we need separate BAR ram and shared memory.  Have
> just shared memory, exposed by the BAR when connected.  When the PCI card is
> disconnected from shared memory, the BAR should discard writes and return
> all 1s for reads.
>
> Having a temporary RAM area while disconnected doesn't serve a purpose
> (since it exists only for a short while) and increases the RAM load.

I agree with Avi here.  If a guest wants to view shared memory, then
it needs to stay connected.  I think being able to see the contents
(via the memcpy()) even though the guest is disconnected would be
confusing.

>
>> I believe role=master should be default because I think a relationship of
>> master/slave is going to be much more common than peering.
>
> What if you have N guests?  What if the master disconnects?
>
>>
>>>>
>>>> If you're adding additional queues to support other levels of
>>>> communication, you can always use different areas of shared memory.
>>>
>>> You'll need O(n^2) shared memory areas (n=peer count), and it is a lot
>>> less flexible that real shared memory.  Consider using threading where the
>>> only communication among threads is a pipe (erlang?)
>>
>> I can't think of a use of multiple peers via shared memory today with
>> virtualization.  I know lots of master/slave uses of shared memory though.
>>  I agree that it's useful to support from an academic perspective but I
>> don't believe it's going to be the common use.
>
> Large shared cache.  That use case even survives live migration if you use
> lockless algorithms.
>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html