On 1/14/22 7:33 AM, Pavel Begunkov wrote: > On 1/14/22 13:47, Jens Axboe wrote: >> On 1/14/22 4:59 AM, Pavel Begunkov wrote: >>> Fixes a problem described in 50252e4b5e989 >>> ("aio: fix use-after-free due to missing POLLFREE handling") >>> and copies the approach used there. >>> >>> In short, we have to forcibly eject a poll entry when we meet POLLFREE. >>> We can't rely on io_poll_get_ownership() as can't wait for potentially >>> running tw handlers, so we use the fact that wqs are RCU freed. See >>> Eric's patch and comments for more details. >>> >>> Reported-by: Eric Biggers <ebiggers@xxxxxxxxxx> >>> Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@xxxxxxxxxx >>> Reported-and-tested-by: syzbot+5426c7ed6868c705ca14@xxxxxxxxxxxxxxxxxxxxxxxxx >>> Fixes: 221c5eb233823 ("io_uring: add support for IORING_OP_POLL") >>> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> >>> --- >>> fs/io_uring.c | 60 +++++++++++++++++++++++++++++++++++++++++++-------- >>> 1 file changed, 51 insertions(+), 9 deletions(-) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index fa3277844d2e..bc424af1833b 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -5462,12 +5462,14 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events, >>> >>> static inline void io_poll_remove_entry(struct io_poll_iocb *poll) >>> { >>> - struct wait_queue_head *head = poll->head; >>> + struct wait_queue_head *head = smp_load_acquire(&poll->head); >>> >>> - spin_lock_irq(&head->lock); >>> - list_del_init(&poll->wait.entry); >>> - poll->head = NULL; >>> - spin_unlock_irq(&head->lock); >>> + if (head) { >>> + spin_lock_irq(&head->lock); >>> + list_del_init(&poll->wait.entry); >>> + poll->head = NULL; >>> + spin_unlock_irq(&head->lock); >>> + } >>> } >>> >>> static void io_poll_remove_entries(struct io_kiocb *req) >>> @@ -5475,10 +5477,26 @@ static void io_poll_remove_entries(struct io_kiocb *req) >>> struct io_poll_iocb *poll = io_poll_get_single(req); >>> struct io_poll_iocb *poll_double = io_poll_get_double(req); >>> >>> - if (poll->head) >>> - io_poll_remove_entry(poll); >>> - if (poll_double && poll_double->head) >>> + /* >>> + * While we hold the waitqueue lock and the waitqueue is nonempty, >>> + * wake_up_pollfree() will wait for us. However, taking the waitqueue >>> + * lock in the first place can race with the waitqueue being freed. >>> + * >>> + * We solve this as eventpoll does: by taking advantage of the fact that >>> + * all users of wake_up_pollfree() will RCU-delay the actual free. If >>> + * we enter rcu_read_lock() and see that the pointer to the queue is >>> + * non-NULL, we can then lock it without the memory being freed out from >>> + * under us. >>> + * >>> + * Keep holding rcu_read_lock() as long as we hold the queue lock, in >>> + * case the caller deletes the entry from the queue, leaving it empty. >>> + * In that case, only RCU prevents the queue memory from being freed. >>> + */ >>> + rcu_read_lock(); >>> + io_poll_remove_entry(poll); >>> + if (poll_double) >>> io_poll_remove_entry(poll_double); >>> + rcu_read_unlock(); >>> } >>> >>> /* >>> @@ -5618,13 +5636,37 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, >>> wait); >>> __poll_t mask = key_to_poll(key); >>> >>> + if (unlikely(mask & POLLFREE)) { >>> + io_poll_mark_cancelled(req); >>> + /* we have to kick tw in case it's not already */ >>> + io_poll_execute(req, 0); >>> + >>> + /* >>> + * If the waitqueue is being freed early but someone is already >>> + * holds ownership over it, we have to tear down the request as >>> + * best we can. That means immediately removing the request from >>> + * its waitqueue and preventing all further accesses to the >>> + * waitqueue via the request. >>> + */ >>> + list_del_init(&poll->wait.entry); >>> + >>> + /* >>> + * Careful: this *must* be the last step, since as soon >>> + * as req->head is NULL'ed out, the request can be >>> + * completed and freed, since aio_poll_complete_work() >>> + * will no longer need to take the waitqueue lock. >>> + */ >>> + smp_store_release(&poll->head, NULL); >>> + return 1; >>> + } >>> + >>> /* for instances that support it check for an event match first */ >>> if (mask && !(mask & poll->events)) >>> return 0; >>> >>> if (io_poll_get_ownership(req)) { >>> /* optional, saves extra locking for removal in tw handler */ >>> - if (mask && poll->events & EPOLLONESHOT) { >>> + if (mask && (poll->events & EPOLLONESHOT)) { >>> list_del_init(&poll->wait.entry); >>> poll->head = NULL; >>> } >> >> Nice work, and good job documenting it too. Just one minor comment - > > Comments are copy-pasted from aio, all credit to Eric Well, good job to Eric then :-) >> this last change here seems like it was a leftover thing, mind if I drop >> this non-functional change from the patch? > > Sure, it doesn't hurt but whatever way is easier OK done, thanks for the fix, applied. -- Jens Axboe