Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 22 Jun 2009, Gregory Haskins wrote:

> So up in userspace, the vbus pci-device would have an open reference to
> the kvm guest (derived from /dev/kvm) and an open reference to a vbus
> (derived from /dev/vbus).  Lets call these kvmfd, and vbusfd,
> respectively.  For something like an interrupt, we would hook the point
> where the PCI-MSI interrupt is assigned, and would do the following:
> 
> gsi = kvm_irq_route_gsi();
> fd = eventfd(0, 0);
> ioctl(kvmfd, KVM_IRQFD_ASSIGN, {fd, gsi});
> ioctl(vbusfd, VBUS_SHMSIGNAL_ASSIGN, {sigid, fd});
> 
> So userspace orchestrated the assignment of this one eventfd to a KVM
> consumer, and a VBUS producer.  The two subsystems do not care about the
> details of the other side of the link, per se.  VBUS just knows that it
> can eventfd_signal() its memory region to tell whomever is listening
> that it changed.  Likewise, KVM just knows to inject "gsi" when it gets
> signalled.  You could equally have given "fd" to a userspace thread for
> either producer or consumer roles, or any other combination.
> 
> If we were doing PCI-passthough, substitute the last SHMSIGNAL_ASSIGN
> ioctl call with some PCI_PASSTHROUGH_ASSIGN verb and you get the idea.
> 
> The important thing is that once this is established, userspace doesn't
> necessarily care about the fd anymore.  So now the question is: do we
> keep it around for other things?  Do we keep it around because we don't
> want KVM to see the POLLHUP, or do we address the "release" code so that
> it works even if userspace issued close(fd) at this point.  I am not
> sure what the answer is, but this is the scenario we are concerned with
> in this thread.  In the example above, vbus is free to produce events on
> its eventfd until it gets a SHMSIGNAL_DEASSIGN request.

I see.
The thing remains, that in order to reliably handle generic 
producer/consumer scenarios you'd need a reference counting similar to 
pipes, where the notion of producer and consumer is very well defined.
In your case, it'd be sufficent to expose an:

	int eventfd_wakeup(struct eventfd_ctx *ctx, void *key);

So that each end can signal, in an file* f_count independent way, when 
they're about to leave. Say with a new status bit.
The problem, that I felt coming since when I introduced the key-based 
wakeups, is that right now the "key" start to become a little restrictive 
in terms of amount of information carried by it.
If there are other key-aware waiters on "wqh", your new bit cannot 
conflict with existing ones, and you (and future users of the interface) 
will be polluting the global bit namespace.
Probably turning "key" into a pointer to a structure like:

	struct wakeup_key {
		unsigned long type;
		void *key;
	};

So the current:

	wake_up_poll(wq, POLLIN);

Would become:

	key.type = KT_POLLMASK;
	key.key = POLLIN;
	wake_up_key(wq, &key);

At that point the waiters (like poll/epoll) can simply discard the wakeup 
types they are not interested into (poll/epoll would care only about 
KT_POLLMASK), and something more than a simple bitmask can be carried by 
the wakups. Like:

	key.type = KT_SOMEDATA;
	key.key = &data;
	wake_up_key(wq, &key);

Of course, we could build another layer on top, to fit this model, but 
then we'd have the dual citizenship among normal wakeups and new-interface 
wakeups/signaling.
Another thing. Now that the interface exposes the eventfd_ctx context 
directly, the pollcb register/unregister does not have any reason to be 
inside eventfd (they're totally generic bits at that point), so my thought 
was about to drop them.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux