On Fri, 19 Jun 2009, Gregory Haskins wrote: > Davide Libenzi wrote: > > On Fri, 19 Jun 2009, Gregory Haskins wrote: > > > > > >> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a > >> notifier->release() callback. This lets notification clients know if > >> the eventfd is about to go away and is very useful particularly for > >> in-kernel clients. However, as it stands today it is not possible to > >> use the notification API in a race-free way. This patch adds some > >> additional logic to the notification subsystem to rectify this problem. > >> > >> Background: > >> ----------------------- > >> Eventfd currently only has one reference count mechanism: fget/fput. This > >> in of itself is normally fine. However, if a client expects to be > >> notified if the eventfd is closed, it cannot hold a fget() reference > >> itself or the underlying f_ops->release() callback will never be invoked > >> by VFS. Therefore we have this somewhat unusual situation where we may > >> hold a pointer to an eventfd object (by virtue of having a waiter registered > >> in its wait-queue), but no reference. This makes it nearly impossible to > >> design a mutual decoupling algorithm: you cannot unhook one side from the > >> other (or vice versa) without racing. > >> > > > > And why is that? > > > > struct xxx { > > struct mutex mtx; > > struct file *file; > > ... > > }; > > > > struct file *xxx_get_file(struct xxx *x) { > > struct file *file; > > > > mutex_lock(&x->mtx); > > file = x->file; > > if (!file) > > mutex_unlock(&x->mtx); > > return file; > > } > > > > void xxx_release_file(struct xxx *x) { > > mutex_unlock(&x->mtx); > > } > > > > void handle_POLLHUP(struct xxx *x) { > > struct file *file; > > > > file = xxx_get_file(x); > > if (file) { > > unhook_waitqueue(file, ...); > > x->file = NULL; > > xxx_release_file(x); > > } > > } > > > > > > Every time you need to "use" file, you call xxx_get_file(), and if you get > > NULL, it means it's gone and you handle it accordigly to your IRQ fd > > policies. As soon as you done with the file, you call xxx_release_file(). > > Replace "mtx" with the lock that fits your needs. > > > > Consider what would happen if the f_ops->release() was preempted inside > the wake_up_locked_polled() after it dereferenced the xxx from the list, > but before it calls the callback(POLLHUP). The xxx object, and/or the > .text for the xxx object may be long gone by the time it comes back > around. Afaict, there is no way to guard against that scenario unless > you do something like 2/3+3/3. Or am I missing something? Right. Don't you see an easier answer to that, instead of adding 300 lines of code to eventfd? For example, turning wake_up_locked() into a nornal wake_up(). - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html