On Mon, May 10, 2010 at 11:52 AM, Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote: > On 05/10/2010 12:43 PM, Cam Macdonell wrote: >> >> On Mon, May 10, 2010 at 11:25 AM, Anthony Liguori<anthony@xxxxxxxxxxxxx> >> wrote: >> >>> >>> On 05/10/2010 11:59 AM, Avi Kivity wrote: >>> >>>> >>>> On 05/10/2010 06:38 PM, Anthony Liguori wrote: >>>> >>>>> >>>>> >>>>>>> >>>>>>> Otherwise, if the BAR is allocated during initialization, I would >>>>>>> have >>>>>>> to use MAP_FIXED to mmap the memory. This is what I did before the >>>>>>> qemu_ram_mmap() function was added. >>>>>>> >>>>>> >>>>>> What would happen to any data written to the BAR before the the >>>>>> handshake completed? I think it would disappear. >>>>>> >>>>> >>>>> You don't have to do MAP_FIXED. You can allocate a ram area and map >>>>> that >>>>> in when disconnected. When you connect, you create another ram area >>>>> and >>>>> memcpy() the previous ram area to the new one. You then map the second >>>>> ram >>>>> area in. >>>>> >>>> >>>> But it's a shared memory area. Other peers could have connected and >>>> written some data in. The memcpy() would destroy their data. >>>> >>> >>> Why try to attempt to support multi-master shared memory? What's the >>> use-case? >>> >> >> I don't see it as multi-master, but that the latest guest to join >> shouldn't have their contents take precedence. In developing this >> patch, my motivation has been to let the guests decide. If the memcpy >> is always done, even when no data is written, a guest cannot join >> without overwriting everything. >> >> One use case we're looking at is having VMs using a map reduce >> framework like Hadoop or Phoenix running in VMs. However, if a >> workqueue is stored or data transfer passes through shared memory, a >> system can't scale up the number of workers because each new guest >> will erase the shared memory (and the workqueue or in progress data >> transfer). >> > > (Replying again to list) Sorry about that. > What data structure would you use? For a lockless ring queue, you can only > support a single producer and consumer. To achieve bidirectional > communication in virtio, we always use two queues. MCS locks can work with multiple producer/consumers, either with busy waiting or using the doorbell mechanism. > > If you're adding additional queues to support other levels of communication, > you can always use different areas of shared memory. True, and my point is simply that the memcpy would wipe those all out. > > I guess this is the point behind the doorbell mechanism? Yes. > > Regards, > > Anthony Liguori > >> In cases where the latest guest to join wants to clear the memory, it >> can do so without the automatic memcpy. The guest can do a memset >> once it knows the memory is attached. My opinion is to leave it to >> the guests and the application that is using the shared memory to >> decide what to do on guest joins. >> >> Cam >> >> >>> >>> Regards, >>> >>> Anthony Liguori >>> >>> >>>>> >>>>> From the guest's perspective, it's totally transparent. For the >>>>> backend, >>>>> I'd suggest having an explicit "initialized" ack or something so that >>>>> it >>>>> knows that the data is now mapped to the guest. >>>>> >>>> >>>> From the peers' perspective, it's non-transparent :( >>>> >>>> Also it doubles the transient memory requirement. >>>> >>>> >>>>> >>>>> If you're doing just a ring queue in shared memory, it should allow >>>>> disconnect/reconnect during live migration asynchronously to the actual >>>>> qemu >>>>> live migration. >>>>> >>>>> >>>> >>>> Live migration of guests using shared memory is interesting. You'd need >>>> to freeze all peers on one node, disconnect, reconnect, and restart them >>>> on >>>> the other node. >>>> >>>> >>> >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html