Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/25/2010 08:08 AM, Cam Macdonell wrote:
Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to the
qemu-kvm repository.

Changes in this version are using the qdev format and optional use of MSI and
ioeventfd/irqfd.

The non-interrupt version is supported by passing the shm parameter

     -device ivshmem,size=<size in MB>,[shm=<shm_name>]

which will simply map the shm object into a BAR.

Interrupts are supported between multiple VMs by using a shared memory server
that is connected to with a socket character device

     -device ivshmem,size=<size in MB>[,chardev=<chardev name>][,irqfd=on]
             [,msi=on][,nvectors=n]
     -chardev socket,path=<path>,id=<chardev name>

The server passes file descriptors for the shared memory object and eventfds (our
interrupt mechanism) to the respective qemu instances.

When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

Please put the spec somewhere publicly accessible with a permanent URL. I suggest a new qemu.git directory specs/. It's more important than the code IMO.

enum ivshmem_registers {
     IntrMask = 0,
     IntrStatus = 4,
     IVPosition = 8,
     Doorbell = 12
};

The first two registers are the interrupt mask and status registers.  Mask and
status are only used with pin-based interrupts.  They are unused with MSI
interrupts.  The IVPosition register is read-only and reports the guest's ID
number.  Interrupts are triggered when a message is received on the guest's
eventfd from another VM.  To trigger an event, a guest must write to another
guest's Doorbell.  The "Doorbells" begin at offset 12.  A particular guest's
doorbell offset in the MMIO region is equal to

guest_id * 32 + Doorbell

The doorbell register for each guest is 32-bits.  The doorbell-per-guest
design was motivated for use with ioeventfd.

You can also use a single doorbell register with ioeventfd, as it can match against the data written. If you go this route, you'd have two doorbells, one where you write a guest ID to send an interrupt to that guest, and one where any write generates a multicast.

Possible later extensions:
- multiple doorbells that trigger different vectors
- multicast doorbells

The semantics of the value written to the doorbell depends on whether the
device is using MSI or a regular pin-based interrupt.

I recommend against making the semantics interrupt-style dependent. It means the application needs to know whether MSI is in use or not, while it is generally the OS that is in control of that.

Regular Interrupts
------------------

If regular interrupts are used (due to either a guest not supporting MSI or the
user specifying not to use them on the command-line) then the value written to
a guest's doorbell is what the guest's status register will be set to.

An status of (2^32 - 1) indicates that a new guest has joined.  Guests
should not send a message of this value for any other reason.

Message Signalled Interrupts
----------------------------

The important thing to remember with MSI is that it is only a signal, no
status is set (since MSI interrupts are not shared).  All information other
than the interrupt itself should be communicated via the shared memory region.
MSI is on by default.  It can be turned off with the msi=off to the parameter.

If the device uses MSI then the value written to the doorbell is the MSI vector
that will be raised.  Vector 0 is used to notify that a new guest has joined.
Vector 0 cannot be triggered by another guest since a value of 0 does not
trigger an eventfd.

Ah, looks like we approached the vector/guest matrix from different directions.

ioeventfd/irqfd
---------------

ioeventfd/irqfd is turned on by irqfd=on passed to the device parameter (it is
off by default).  When using ioeventfd/irqfd the only interrupt value that can
be passed to another guest is 1 despite what value is written to a guest's
Doorbell.

ioeventfd/irqfd are an implementation detail. The spec should not depend on it. It needs to be written as if qemu and kvm do not exist. Again, I recommend Rusty's virtio-pci for inspiration.

Applications should see exactly the same thing whether ioeventfd is enabled or not.

Sample programs, init scripts and the shared memory server are available in a
git repo here:

     www.gitorious.org/nahanni

Cam Macdonell (2):
   Support adding a file to qemu's ram allocation
   Inter-VM shared memory PCI device

Do you plan do maintain the server indefinitely in that repository? If not, we can put it in qemu.git, perhaps under contrib/.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux