On 2/5/25 16:53, Jeff Layton wrote: > On Tue, 2025-02-04 at 22:31 +0100, Bernd Schubert wrote: >> fuse: {io-uring} Fix a possible req cancellation race >> >> From: Bernd Schubert <bschubert@xxxxxxx> >> >> task-A (application) might be in request_wait_answer and >> try to remove the request when it has FR_PENDING set. >> >> task-B (a fuse-server io-uring task) might handle this >> request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when >> fetching the next request and accessed the req from >> the pending list in fuse_uring_ent_assign_req(). >> That code path was not protected by fiq->lock and so >> might race with task-A. >> >> For scaling reasons we better don't use fiq->lock, but >> add a handler to remove canceled requests from the queue. >> >> Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support") >> Reported-by: Joanne Koong <joannelkoong@xxxxxxxxx> >> Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@xxxxxxxxxxxxxx/ >> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> >> >> -- >> Compilation tested only >> --- >> fs/fuse/dev.c | 25 ++++++++++++++++--------- >> fs/fuse/dev_uring.c | 25 +++++++++++++++++++++---- >> fs/fuse/dev_uring_i.h | 6 ++++++ >> fs/fuse/fuse_dev_i.h | 2 ++ >> fs/fuse/fuse_i.h | 2 ++ >> 5 files changed, 47 insertions(+), 13 deletions(-) >> >> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c >> index 80a11ef4b69a..0494ea47893a 100644 >> --- a/fs/fuse/dev.c >> +++ b/fs/fuse/dev.c >> @@ -157,7 +157,7 @@ static void __fuse_get_request(struct fuse_req *req) >> } >> >> /* Must be called with > 1 refcount */ >> -static void __fuse_put_request(struct fuse_req *req) >> +void __fuse_put_request(struct fuse_req *req) >> { >> refcount_dec(&req->count); >> } >> @@ -529,16 +529,23 @@ static void request_wait_answer(struct fuse_req *req) >> if (!err) >> return; >> >> - spin_lock(&fiq->lock); >> - /* Request is not yet in userspace, bail out */ >> - if (test_bit(FR_PENDING, &req->flags)) { >> - list_del(&req->list); >> + if (test_bit(FR_URING, &req->flags)) { >> + bool removed = fuse_uring_remove_pending_req(req); >> + >> + if (removed) >> + return; >> + } else { >> + spin_lock(&fiq->lock); >> + /* Request is not yet in userspace, bail out */ >> + if (test_bit(FR_PENDING, &req->flags)) { >> + list_del(&req->list); >> + spin_unlock(&fiq->lock); >> + __fuse_put_request(req); >> + req->out.h.error = -EINTR; >> + return; >> + } > > One thing that bothers me with the existing code and this patch is that > the semantics around FR_PENDING are unclear. > > I know it's supposed to mean that the req is waiting for userland to > read it, but in the above case for instance, we're removing it from the > list and dropping its refcount while leaving the bit set. Shouldn't we > clear it there and in fuse_uring_remove_pending_req()? > Yeah, I think the assumption from the code is that the request is now getting destructed, I will add in to clear FR_PENDING. Thanks, Bernd