On Thu, 7 Jan 2010, Michael S. Tsirkin wrote: > Sure, I was trying to be as brief as possible, here's a detailed summary. > > Description of the system (MSI emulation in KVM): > > KVM supports an ioctl to assign/deassign an eventfd file to interrupt message > in guest OS. When this eventfd is signalled, interrupt message is sent. > This assignment is done from qemu system emulator. > > eventfd is signalled from device emulation in another thread in > userspace or from kernel, which talks with guest OS through another > eventfd and shared memory (possibility of out of process was discussed > but never got implemented yet). > > Note: it's okay to delay messages from correctness point of view, but > generally this is latency-sensitive path. If multiple identical messages > are requested, it's okay to send a single last message, but missing a > message altogether causes deadlocks. Sending a message when none were > requested might in theory cause crashes, in practice doing this causes > performance degradation. > > Another KVM feature is interrupt masking: guest OS requests that we > stop sending some interrupt message, possibly modified mapping > and re-enables this message. This needs to be done without > involving the device that might keep requesting events: > while masked, message is marked "pending", and guest might test > the pending status. > > We can implement masking in system emulator in userspace, by using > assign/deassign ioctls: when message is masked, we simply deassign all > eventfd, and when it is unmasked, we assign them back. > > Here's some code to illustrate how this all works: assign/deassign code > in kernel looks like the following: > > > this is called to unmask interrupt > > static int > kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) > { > struct _irqfd *irqfd, *tmp; > struct file *file = NULL; > struct eventfd_ctx *eventfd = NULL; > int ret; > unsigned int events; > > irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); > > ... > > file = eventfd_fget(fd); > if (IS_ERR(file)) { > ret = PTR_ERR(file); > goto fail; > } > > eventfd = eventfd_ctx_fileget(file); > if (IS_ERR(eventfd)) { > ret = PTR_ERR(eventfd); > goto fail; > } > > irqfd->eventfd = eventfd; > > /* > * Install our own custom wake-up handling so we are notified via > * a callback whenever someone signals the underlying eventfd > */ > init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); > init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc); > > spin_lock_irq(&kvm->irqfds.lock); > > events = file->f_op->poll(file, &irqfd->pt); > > list_add_tail(&irqfd->list, &kvm->irqfds.items); > spin_unlock_irq(&kvm->irqfds.lock); > > A. > /* > * Check if there was an event already pending on the eventfd > * before we registered, and trigger it as if we didn't miss it. > */ > if (events & POLLIN) > schedule_work(&irqfd->inject); > > /* > * do not drop the file until the irqfd is fully initialized, otherwise > * we might race against the POLLHUP > */ > fput(file); > > return 0; > > fail: > ... > } What is you do (under proper irqfd locking) something like: eventfd_ctx_read(ctx, 1, &cnt); if (irqfd->cnt != cnt) { irqfd->cnt = cnt; schedule_work(&irqfd->inject); } > And deactivation deep down does this (from irqfd_cleanup_wq workqueue, > so this is not under the spinlock): > > /* > * Synchronize with the wait-queue and unhook ourselves to > * prevent > * further events. > */ > B. > remove_wait_queue(irqfd->wqh, &irqfd->wait); > > .... > > /* > * It is now safe to release the object's resources > */ > eventfd_ctx_put(irqfd->eventfd); > kfree(irqfd); And: eventfd_ctx_read(ctx, 1, &irqfd->cnt); remove_wait_queue(irqfd->wqh, &irqfd->wait); - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html