When posted interrupts are in use, KVM fully bypasses the eventfd and delivers events directly to the appropriate vCPU. Without posted interrupts, it still uses the eventfd but it doesn't actually stop userspace from receiving the events too. This leaves userspace having to carefully avoid seeing the same events and injecting duplicate interrupts to the guest. Fix it by adding a 'priority' mode for exclusive waiters which puts them at the head of the list, where they can consume events before the non-exclusive waiters are woken. v2: • Drop [RFC]. This seems to be working nicely, and userspace is a lot cleaner without having to mess around with adding/removing the eventfd to its poll set. And nobody yelled at me. Yet. • Reword commit comments, update comment above __wake_up_common() • Rebase to be applied after the (only vaguely related) fix to make irqfd actually consume the eventfd counter too. David Woodhouse (2): sched/wait: Add add_wait_queue_priority() kvm/eventfd: Use priority waitqueue to catch events before userspace include/linux/wait.h | 12 +++++++++++- kernel/sched/wait.c | 17 ++++++++++++++++- virt/kvm/eventfd.c | 6 ++++--