On Mon, Jun 10, 2024 at 06:12:06AM +0100, Al Viro wrote: > vfio_virqfd_enable() has the same problem, except that there we > definitely can't move vfs_poll() under the lock - it's a spinlock. > > Could we move vfs_poll() + inject to _before_ making the thing > public? We'd need to delay POLLHUP handling there, but then > we need it until the moment with do inject anyway. Something > like replacing > if (!list_empty(&irqfd->list)) > hsm_irqfd_shutdown(irqfd); > in hsm_irqfd_shutdown_work() with > if (!list_empty(&irqfd->list)) > hsm_irqfd_shutdown(irqfd); > else > irqfd->need_shutdown = true; > and doing > if (unlikely(irqfd->need_shutdown)) > hsm_irqfd_shutdown(irqfd); > else > list_add_tail(&irqfd->list, &vm->irqfds); > when the sucker is made visible. > > I'm *not* familiar with the area, though, so that might be unfeasible > for any number of reasons. Hmm... OK, so we rely upon EPOLLHUP being generated only upon the final close of eventfd file. And vfio seems to have an exclusion in all callers of vfio_virqfd_{en,dis}able(), which ought to be enough. For drivers/virt/acrn/irqfd.c EPOLLHUP is not a problem for the same reasons, but there's no exclusion between acrn_irqfd_assign() and acrn_irqfd_deassign() calls. So the scenario with explicit deassign racing with assign and leading to vfs_poll(file, <freed memory>) is possible. And it looks like drivers/xen/privcmd.c:privcmd_irqfd_assign() has a similar problem... How about the following for acrn side of things? Does anybody see a problem with that "do vfs_poll() before making the thing visible" approach? diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c index d4ad211dce7a..71c431506a9b 100644 --- a/drivers/virt/acrn/irqfd.c +++ b/drivers/virt/acrn/irqfd.c @@ -133,7 +133,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args) eventfd = eventfd_ctx_fileget(f.file); if (IS_ERR(eventfd)) { ret = PTR_ERR(eventfd); - goto fail; + goto out_file; } irqfd->eventfd = eventfd; @@ -145,29 +145,26 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args) init_waitqueue_func_entry(&irqfd->wait, hsm_irqfd_wakeup); init_poll_funcptr(&irqfd->pt, hsm_irqfd_poll_func); + /* Check the pending event in this stage */ + events = vfs_poll(f.file, &irqfd->pt); + + if (events & EPOLLIN) + acrn_irqfd_inject(irqfd); + mutex_lock(&vm->irqfds_lock); list_for_each_entry(tmp, &vm->irqfds, list) { if (irqfd->eventfd != tmp->eventfd) continue; - ret = -EBUSY; + hsm_irqfd_shutdown(irqfd); mutex_unlock(&vm->irqfds_lock); - goto fail; + irqfd = NULL; // consumed by hsm_irqfd_shutdown() + ret = -EBUSY; + goto out_file; } list_add_tail(&irqfd->list, &vm->irqfds); + irqfd = NULL; // not for us to free... mutex_unlock(&vm->irqfds_lock); - - /* Check the pending event in this stage */ - events = vfs_poll(f.file, &irqfd->pt); - - if (events & EPOLLIN) - acrn_irqfd_inject(irqfd); - - fdput(f); - return 0; -fail: - if (eventfd && !IS_ERR(eventfd)) - eventfd_ctx_put(eventfd); - +out_file: fdput(f); out: kfree(irqfd);