On Thu, Mar 25, 2010 at 11:02 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: > On 03/25/2010 06:50 PM, Cam Macdonell wrote: >> >>> Please put the spec somewhere publicly accessible with a permanent URL. >>> I >>> suggest a new qemu.git directory specs/. It's more important than the >>> code >>> IMO. >>> >> >> Sorry to be pedantic, do you want a URL or the spec as part of a patch >> that adds it as a file in qemu.git/docs/specs/ >> > > I leave it up to you. If you are up to hosting it independently, than just > post a URL as part of the patch. Otherwise, I'm sure qemu.git will be more > than happy to be the official repository for the memory sharing device > specification. In that case, make the the spec the first patch in the > series. Ok, I'll send it as part of the series that way people can comment inline easily. > >>> Possible later extensions: >>> - multiple doorbells that trigger different vectors >>> - multicast doorbells >>> >> >> Since the doorbells are exposed the multicast could be done by the >> driver. If multicast is handled by qemu, then we have different >> behaviour when using ioeventfd/irqfd since only one eventfd can be >> triggered by a write. >> > > Multicast by the driver would require one exit per guest signalled. > Multicast by the shared memory server needs one exit to signal an eventfd, > then the shared memory server signals the irqfds of all members of the > multicast group. > >>>> The semantics of the value written to the doorbell depends on whether >>>> the >>>> device is using MSI or a regular pin-based interrupt. >>>> >>>> >>> >>> I recommend against making the semantics interrupt-style dependent. It >>> means the application needs to know whether MSI is in use or not, while >>> it >>> is generally the OS that is in control of that. >>> >> >> It is basically the use of the status register that is the difference. >> The application view of what is happening doesn't need to change, >> especially with UIO: write to doorbells, block on read until interrupt >> arrives. In the MSI case I could set the status register to the >> vector that is received and then the would be equivalent from the view >> of the application. But, if future MSI support in UIO allows MSI >> information (such as vector number) to be accessible in userspace, >> then applications would become MSI dependent anyway. >> > > Ah, I see. You adjusted for the different behaviours in the driver. > > Still I recommend dropping the status register: this allows single-msi and > PIRQ to behave the same way. Also it is racy, if two guests signal a third, > they will overwrite each other's status. With shared interrupts with PIRQ without a status register how does a device know it generated the interrupt? > >>> ioeventfd/irqfd are an implementation detail. The spec should not depend >>> on >>> it. It needs to be written as if qemu and kvm do not exist. Again, I >>> recommend Rusty's virtio-pci for inspiration. >>> >>> Applications should see exactly the same thing whether ioeventfd is >>> enabled >>> or not. >>> >> >> The challenge I recently encountered with this is one line in the >> eventfd implementation >> >> from kvm/virt/kvm/eventfd.c >> >> /* MMIO/PIO writes trigger an event if the addr/val match */ >> static int >> ioeventfd_write(struct kvm_io_device *this, gpa_t addr, int len, >> const void *val) >> { >> struct _ioeventfd *p = to_ioeventfd(this); >> >> if (!ioeventfd_in_range(p, addr, len, val)) >> return -EOPNOTSUPP; >> >> eventfd_signal(p->eventfd, 1); >> return 0; >> } >> >> IIUC, no matter what value is written to an ioeventfd by a guest, a >> value of 1 is written. So ioeventfds work differently than eventfds. >> Can we add a "multivalue" flag to ioeventfds so that the value that >> the guest writes is written to eventfd? >> > > Eventfd values are a counter, not a register. A read() on the other side > returns the sum of all write()s (or eventfd_signal()s). In the context of > irqfd it just means the number of interrupts we coalesced. > > Multivalue was considered at one time for a different need and rejected. > Really, to solve the race you need a queue, and that can only be done in > the shared memory segment using locked instructions. I had a hunch it was probably considered. That explains why irqfd doesn't have a datamatch field. I guess supporting multiple MSI vectors with one doorbell per guest isn't possible if one 1 bit of information can be communicated. So, ioeventfd/irqfd restricts MSI to 1 vector between guests. Should multi-MSI even be supported then in the non-ioeventfd/irq case? Otherwise ioeventfd/irqfd become more than an implementation detail. > > -- > error compiling committee.c: too many arguments to function > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html