Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 25, 2010 at 3:04 AM, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 03/25/2010 08:08 AM, Cam Macdonell wrote:
>>
>> Support an inter-vm shared memory device that maps a shared-memory object
>> as a PCI device in the guest.  This patch also supports interrupts between
>> guest by communicating over a unix domain socket.  This patch applies to
>> the
>> qemu-kvm repository.
>>
>> Changes in this version are using the qdev format and optional use of MSI
>> and
>> ioeventfd/irqfd.
>>
>> The non-interrupt version is supported by passing the shm parameter
>>
>>     -device ivshmem,size=<size in MB>,[shm=<shm_name>]
>>
>> which will simply map the shm object into a BAR.
>>
>> Interrupts are supported between multiple VMs by using a shared memory
>> server
>> that is connected to with a socket character device
>>
>>     -device ivshmem,size=<size in MB>[,chardev=<chardev name>][,irqfd=on]
>>             [,msi=on][,nvectors=n]
>>     -chardev socket,path=<path>,id=<chardev name>
>>
>> The server passes file descriptors for the shared memory object and
>> eventfds (our
>> interrupt mechanism) to the respective qemu instances.
>>
>> When using interrupts, VMs communicate with a shared memory server that
>> passes
>> the shared memory object file descriptor using SCM_RIGHTS.  The server
>> assigns
>> each VM an ID number and sends this ID number to the Qemu process along
>> with a
>> series of eventfd file descriptors, one per guest using the shared memory
>> server.  These eventfds will be used to send interrupts between guests.
>>  Each
>> guest listens on the eventfd corresponding to their ID and may use the
>> others
>> for sending interrupts to other guests.
>>
>
> Please put the spec somewhere publicly accessible with a permanent URL.  I
> suggest a new qemu.git directory specs/.  It's more important than the code
> IMO.

Sorry to be pedantic, do you want a URL or the spec as part of a patch
that adds it as  a file in qemu.git/docs/specs/

>
>> enum ivshmem_registers {
>>     IntrMask = 0,
>>     IntrStatus = 4,
>>     IVPosition = 8,
>>     Doorbell = 12
>> };
>>
>> The first two registers are the interrupt mask and status registers.  Mask
>> and
>> status are only used with pin-based interrupts.  They are unused with MSI
>> interrupts.  The IVPosition register is read-only and reports the guest's
>> ID
>> number.  Interrupts are triggered when a message is received on the
>> guest's
>> eventfd from another VM.  To trigger an event, a guest must write to
>> another
>> guest's Doorbell.  The "Doorbells" begin at offset 12.  A particular
>> guest's
>> doorbell offset in the MMIO region is equal to
>>
>> guest_id * 32 + Doorbell
>>
>> The doorbell register for each guest is 32-bits.  The doorbell-per-guest
>> design was motivated for use with ioeventfd.
>>
>
> You can also use a single doorbell register with ioeventfd, as it can match
> against the data written.  If you go this route, you'd have two doorbells,
> one where you write a guest ID to send an interrupt to that guest, and one
> where any write generates a multicast.

I thought of using the datamatch.

>
> Possible later extensions:
> - multiple doorbells that trigger different vectors
> - multicast doorbells

Since the doorbells are exposed the multicast could be done by the
driver.  If multicast is handled by qemu, then we have different
behaviour when using ioeventfd/irqfd since only one eventfd can be
triggered by a write.

>
>> The semantics of the value written to the doorbell depends on whether the
>> device is using MSI or a regular pin-based interrupt.
>>
>
> I recommend against making the semantics interrupt-style dependent.  It
> means the application needs to know whether MSI is in use or not, while it
> is generally the OS that is in control of that.

It is basically the use of the status register that is the difference.
 The application view of what is happening doesn't need to change,
especially with UIO: write to doorbells, block on read until interrupt
arrives.  In the MSI case I could set the status register to the
vector that is received and then the would be equivalent from the view
of the application.  But, if future MSI support in UIO allows MSI
information (such as vector number) to be accessible in userspace,
then applications would become MSI dependent anyway.

>
>> Regular Interrupts
>> ------------------
>>
>> If regular interrupts are used (due to either a guest not supporting MSI
>> or the
>> user specifying not to use them on the command-line) then the value
>> written to
>> a guest's doorbell is what the guest's status register will be set to.
>>
>> An status of (2^32 - 1) indicates that a new guest has joined.  Guests
>> should not send a message of this value for any other reason.
>>
>> Message Signalled Interrupts
>> ----------------------------
>>
>> The important thing to remember with MSI is that it is only a signal, no
>> status is set (since MSI interrupts are not shared).  All information
>> other
>> than the interrupt itself should be communicated via the shared memory
>> region.
>> MSI is on by default.  It can be turned off with the msi=off to the
>> parameter.
>>
>
>> If the device uses MSI then the value written to the doorbell is the MSI
>> vector
>> that will be raised.  Vector 0 is used to notify that a new guest has
>> joined.
>> Vector 0 cannot be triggered by another guest since a value of 0 does not
>> trigger an eventfd.
>>
>
> Ah, looks like we approached the vector/guest matrix from different
> directions.
>
>> ioeventfd/irqfd
>> ---------------
>>
>> ioeventfd/irqfd is turned on by irqfd=on passed to the device parameter
>> (it is
>> off by default).  When using ioeventfd/irqfd the only interrupt value that
>> can
>> be passed to another guest is 1 despite what value is written to a guest's
>> Doorbell.
>>
>
> ioeventfd/irqfd are an implementation detail.  The spec should not depend on
> it.  It needs to be written as if qemu and kvm do not exist.  Again, I
> recommend Rusty's virtio-pci for inspiration.
>
> Applications should see exactly the same thing whether ioeventfd is enabled
> or not.

The challenge I recently encountered with this is one line in the
eventfd implementation

from kvm/virt/kvm/eventfd.c

/* MMIO/PIO writes trigger an event if the addr/val match */
static int
ioeventfd_write(struct kvm_io_device *this, gpa_t addr, int len,
        const void *val)
{
    struct _ioeventfd *p = to_ioeventfd(this);

    if (!ioeventfd_in_range(p, addr, len, val))
        return -EOPNOTSUPP;

    eventfd_signal(p->eventfd, 1);
    return 0;
}

IIUC, no matter what value is written to an ioeventfd by a guest, a
value of 1 is written.  So ioeventfds work differently than eventfds.
Can we add a "multivalue" flag to ioeventfds so that the value that
the guest writes is written to eventfd?

>
>> Sample programs, init scripts and the shared memory server are available
>> in a
>> git repo here:
>>
>>     www.gitorious.org/nahanni
>>
>> Cam Macdonell (2):
>>   Support adding a file to qemu's ram allocation
>>   Inter-VM shared memory PCI device
>>
>
> Do you plan do maintain the server indefinitely in that repository?  If not,
> we can put it in qemu.git, perhaps under contrib/.

In qemu.git is fine with me.

>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux