Re: [PATCH 0/2] eventfd: new EFD_STATE flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:

> Sure, I was trying to be as brief as possible, here's a detailed summary.
> 
> Description of the system (MSI emulation in KVM):
> 
> KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
> in guest OS.  When this eventfd is signalled, interrupt message is sent.
> This assignment is done from qemu system emulator.
> 
> eventfd is signalled from device emulation in another thread in
> userspace or from kernel, which talks with guest OS through another
> eventfd and shared memory (possibility of out of process was discussed
> but never got implemented yet).
> 
> Note: it's okay to delay messages from correctness point of view, but
> generally this is latency-sensitive path. If multiple identical messages
> are requested, it's okay to send a single last message, but missing a
> message altogether causes deadlocks.  Sending a message when none were
> requested might in theory cause crashes, in practice doing this causes
> performance degradation.
> 
> Another KVM feature is interrupt masking: guest OS requests that we
> stop sending some interrupt message, possibly modified mapping
> and re-enables this message. This needs to be done without
> involving the device that might keep requesting events:
> while masked, message is marked "pending", and guest might test
> the pending status.
> 
> We can implement masking in system emulator in userspace, by using
> assign/deassign ioctls: when message is masked, we simply deassign all
> eventfd, and when it is unmasked, we assign them back.
> 
> Here's some code to illustrate how this all works: assign/deassign code
> in kernel looks like the following:
> 
> 
> this is called to unmask interrupt
> 
> static int
> kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> {
> 	struct _irqfd *irqfd, *tmp;
> 	struct file *file = NULL;
> 	struct eventfd_ctx *eventfd = NULL;
> 	int ret;
> 	unsigned int events;
> 
> 	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> 
> ...
> 
> 	file = eventfd_fget(fd);
> 	if (IS_ERR(file)) {
> 		ret = PTR_ERR(file);
> 		goto fail;
> 	}
> 
> 	eventfd = eventfd_ctx_fileget(file);
> 	if (IS_ERR(eventfd)) {
> 		ret = PTR_ERR(eventfd);
> 		goto fail;
> 	}
> 
> 	irqfd->eventfd = eventfd;
> 
> 	/*
> 	 * Install our own custom wake-up handling so we are notified via
> 	 * a callback whenever someone signals the underlying eventfd
> 	 */
> 	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> 	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> 
> 	spin_lock_irq(&kvm->irqfds.lock);
> 
> 	events = file->f_op->poll(file, &irqfd->pt);
> 
> 	list_add_tail(&irqfd->list, &kvm->irqfds.items);
> 	spin_unlock_irq(&kvm->irqfds.lock);
> 
> A.
> 	/*
> 	 * Check if there was an event already pending on the eventfd
> 	 * before we registered, and trigger it as if we didn't miss it.
> 	 */
> 	if (events & POLLIN)
> 		schedule_work(&irqfd->inject);
> 
> 	/*
> 	 * do not drop the file until the irqfd is fully initialized, otherwise
> 	 * we might race against the POLLHUP
> 	 */
> 	fput(file);
> 
> 	return 0;
> 
> fail:
> 	...
> }

What is you do (under proper irqfd locking) something like:

	eventfd_ctx_read(ctx, 1, &cnt);
	if (irqfd->cnt != cnt) {
		irqfd->cnt = cnt;
		schedule_work(&irqfd->inject);
	}




> And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> so this is not under the spinlock):
> 
>         /*
>          * Synchronize with the wait-queue and unhook ourselves to
>          * prevent
>          * further events.
>          */
> B.
>         remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 
> 	....
> 
>         /*
>          * It is now safe to release the object's resources
>          */
>         eventfd_ctx_put(irqfd->eventfd);
>         kfree(irqfd);

And:

	eventfd_ctx_read(ctx, 1, &irqfd->cnt);
	remove_wait_queue(irqfd->wqh, &irqfd->wait);




- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux