On 2/4/25 21:38, Bernd Schubert wrote: > > > On 2/4/25 21:29, Joanne Koong wrote: >> On Tue, Feb 4, 2025 at 12:00 PM Bernd Schubert >> <bernd.schubert@xxxxxxxxxxx> wrote: >>> >>> >>> >>> On 2/4/25 20:26, Joanne Koong wrote: >>>> Hi Bernd, >>>> >>>> On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert >>>> <bernd.schubert@xxxxxxxxxxx> wrote: >>>>> >>>>> Hi Joanne, >>>>> >>>>> On 2/3/25 19:50, Joanne Koong wrote: >>>>>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING >>>>>> bit is cleared from the request flags when assigning a request to a >>>>>> uring entry, the fiq->lock does not need to be held. >>>>>> >>>>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> >>>>>> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support") >>>>>> --- >>>>>> fs/fuse/dev_uring.c | 2 -- >>>>>> 1 file changed, 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c >>>>>> index ab8c26042aa8..42389d3e7235 100644 >>>>>> --- a/fs/fuse/dev_uring.c >>>>>> +++ b/fs/fuse/dev_uring.c >>>>>> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, >>>>>> ent->state); >>>>>> } >>>>>> >>>>>> - spin_lock(&fiq->lock); >>>>>> clear_bit(FR_PENDING, &req->flags); >>>>>> - spin_unlock(&fiq->lock); >>>>>> ent->fuse_req = req; >>>>>> ent->state = FRRS_FUSE_REQ; >>>>>> list_move(&ent->list, &queue->ent_w_req_queue); >>>>> >>>>> I think that would have an issue in request_wait_answer(). Let's say >>>>> >>>>> >>>>> task-A, request_wait_answer(), >>>>> spin_lock(&fiq->lock); >>>>> /* Request is not yet in userspace, bail out */ >>>>> if (test_bit(FR_PENDING, &req->flags)) { // ========> if passed >>>>> list_del(&req->list); // --> removes from the list >>>>> >>>>> task-B, >>>>> fuse_uring_add_req_to_ring_ent() >>>>> clear_bit(FR_PENDING, &req->flags); >>>>> ent->fuse_req = req; >>>>> ent->state = FRRS_FUSE_REQ; >>>>> list_move_tail(&ent->list, &queue->ent_w_req_queue); >>>>> fuse_uring_add_to_pq(ent, req); // ==> Add to list >>>>> >>>>> >>>>> >>>>> What I mean is, task-A passes the if, but is then slower than task-B. I.e. >>>>> task-B runs fuse_uring_add_to_pq() before task-B does the list_del. >>>>> >>>> >>>> Is this race condition possible given that fiq->ops->send_req() is >>>> called (and completed) before request_wait_answer() is called? The >>>> path I see is this: >>>> >>>> __fuse_simple_request() >>>> __fuse_request_send() >>>> fuse_send_one() >>>> fiq->ops->send_req() >>>> fuse_uring_queue_fuse_req() >>>> fuse_uring_add_req_to_ring_ent() >>>> clear FR_PENDING bit >>>> fuse_uring_add_to_pq() >>>> request_wait_answer() >>>> >>>> It doesn't seem like task A can call request_wait_answer() while task >>>> B is running fuse_uring_queue_fuse_req() on the same request while the >>>> request still has the FR_PENDING bit set. >>>> >>>> This case of task A running request_wait_answer() while task B is >>>> executing fuse_uring_add_req_to_ring_ent() can happen through >>>> fuse_uring_commit_fetch() -> fuse_uring_add_req_to_ring_ent(), but at >>>> that point the FR_PENDING flag will have already been cleared on the >>>> request, so this would bypass the "if (test_bit(FR_PENDING,...))" >>>> check in request_wait_answer(). >>> >>> I mean this case. I don't think FR_PENDING is cleared - why should it? >>> And where? The request is pending state, waiting to get into 'FR_SENT'? >>> >>>> >>>> Is there something I'm missing? I think if this race condition is >>>> possible, then we also have a bigger problem where the request can be >>>> freed out in this request_wait_answer() -> if (test_bit(FR_PENDING, >>>> &req->flags))... case while fuse_uring_add_req_to_ring_ent() -> >>>> fuse_uring_add_to_pq() dereferences it still. >>> >>> I don't think so, if we take the lock. >>> >> >> the path I'm looking at is this: >> >> task A - >> __fuse_simple_request() >> fuse_get_req() -> request is allocated (req refcount is 1) >> __fuse_request_send() >> __fuse_get_request() -> req refcount is 2 >> fuse_send_one() -> req gets sent to uring >> request_wait_answer() >> ... >> hits the interrupt case, goes into "if >> test_bit(FR_PENDING, ...)" case which calls __fuse_put_request(), req >> refcount is now 1 >> fuse_put_request() -> req refcount is dropped to 0, request is freed >> >> while in task B - >> fuse_uring_commit_fetch() >> fuse_uring_next_fuse_req() >> fuse_uring_ent_assign_req() >> gets req off fuse_req_queue >> fuse_uring_add_req_to_ring_ent() >> clear FR_PENDING >> fuse_uring_add_to_pq() >> dereferences req >> >> if task A hits the interrupt case in request_wait_answer() and then >> calls fuse_put_request() before task B clears the pending flag (and >> after it's gotten the request from the fuse_req_queue in >> fuse_uring_ent_assign_req()), then I think we hit this case, no? >> > > Oh no, yes, you are right. It is a bit ugly to use fiq lock for list > handling. I think I'm going to add uring handler for that to > request_wait_answer. In general, basically request_wait_answer > is currently operating on the wrong list - it assumes fiq, but that > is not where the request it waiting on. Please see the attached patch, I need to think about a way to test this and will send out properly tomorrow. So far it is only basically compilation tested. Thanks, Bernd
fuse: {io-uring} Fix a possible req cancellation race From: Bernd Schubert <bschubert@xxxxxxx> task-A (application) might be in request_wait_answer and try to remove the request when it has FR_PENDING set. task-B (a fuse-server io-uring task) might handle this request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when fetching the next request and accessed the req from the pending list in fuse_uring_ent_assign_req(). That code path was not protected by fiq->lock and so might race with task-A. For scaling reasons we better don't use fiq->lock, but add a handler to remove canceled requests from the queue. Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support") Reported-by: Joanne Koong <joannelkoong@xxxxxxxxx> Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@xxxxxxxxxxxxxx/ Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> -- Compilation tested only --- fs/fuse/dev.c | 25 ++++++++++++++++--------- fs/fuse/dev_uring.c | 25 +++++++++++++++++++++---- fs/fuse/dev_uring_i.h | 6 ++++++ fs/fuse/fuse_dev_i.h | 2 ++ fs/fuse/fuse_i.h | 2 ++ 5 files changed, 47 insertions(+), 13 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 80a11ef4b69a..0494ea47893a 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -157,7 +157,7 @@ static void __fuse_get_request(struct fuse_req *req) } /* Must be called with > 1 refcount */ -static void __fuse_put_request(struct fuse_req *req) +void __fuse_put_request(struct fuse_req *req) { refcount_dec(&req->count); } @@ -529,16 +529,23 @@ static void request_wait_answer(struct fuse_req *req) if (!err) return; - spin_lock(&fiq->lock); - /* Request is not yet in userspace, bail out */ - if (test_bit(FR_PENDING, &req->flags)) { - list_del(&req->list); + if (test_bit(FR_URING, &req->flags)) { + bool removed = fuse_uring_remove_pending_req(req); + + if (removed) + return; + } else { + spin_lock(&fiq->lock); + /* Request is not yet in userspace, bail out */ + if (test_bit(FR_PENDING, &req->flags)) { + list_del(&req->list); + spin_unlock(&fiq->lock); + __fuse_put_request(req); + req->out.h.error = -EINTR; + return; + } spin_unlock(&fiq->lock); - __fuse_put_request(req); - req->out.h.error = -EINTR; - return; } - spin_unlock(&fiq->lock); } /* diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index 1e2bceb4ff1e..f9abdcf5f7e6 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -771,8 +771,6 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, struct fuse_req *req) { struct fuse_ring_queue *queue = ent->queue; - struct fuse_conn *fc = req->fm->fc; - struct fuse_iqueue *fiq = &fc->iq; lockdep_assert_held(&queue->lock); @@ -782,9 +780,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, ent->state); } - spin_lock(&fiq->lock); clear_bit(FR_PENDING, &req->flags); - spin_unlock(&fiq->lock); ent->fuse_req = req; ent->state = FRRS_FUSE_REQ; list_move_tail(&ent->list, &queue->ent_w_req_queue); @@ -1285,6 +1281,8 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req) if (unlikely(queue->stopped)) goto err_unlock; + set_bit(FR_URING, &req->flags); + req->ring_queue = queue; ent = list_first_entry_or_null(&queue->ent_avail_queue, struct fuse_ring_ent, list); if (ent) @@ -1323,6 +1321,8 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req) return false; } + set_bit(FR_URING, &req->flags); + req->ring_queue = queue; list_add_tail(&req->list, &queue->fuse_req_bg_queue); ent = list_first_entry_or_null(&queue->ent_avail_queue, @@ -1353,6 +1353,23 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req) return true; } +bool fuse_uring_remove_pending_req(struct fuse_req *req) +{ + struct fuse_ring_queue *queue = req->ring_queue; + + spin_lock(&queue->lock); + if (test_bit(FR_PENDING, &req->flags)) { + list_del(&req->list); + spin_unlock(&queue->lock); + __fuse_put_request(req); + req->out.h.error = -EINTR; + return true; + } + spin_unlock(&queue->lock); + + return false; +} + static const struct fuse_iqueue_ops fuse_io_uring_ops = { /* should be send over io-uring as enhancement */ .send_forget = fuse_dev_queue_forget, diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h index a37991d17d34..86071758628f 100644 --- a/fs/fuse/dev_uring_i.h +++ b/fs/fuse/dev_uring_i.h @@ -143,6 +143,7 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req); bool fuse_uring_queue_bq_req(struct fuse_req *req); bool fuse_uring_request_expired(struct fuse_conn *fc); +bool fuse_uring_remove_pending_req(struct fuse_req *req); static inline void fuse_uring_abort(struct fuse_conn *fc) { @@ -206,6 +207,11 @@ static inline bool fuse_uring_request_expired(struct fuse_conn *fc) return false; } +static inline bool fuse_uring_remove_pending_req(struct fuse_req *req) +{ + return false; +} + #endif /* CONFIG_FUSE_IO_URING */ #endif /* _FS_FUSE_DEV_URING_I_H */ diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 19c29c6000a7..36b9092061ea 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -49,6 +49,8 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file) unsigned int fuse_req_hash(u64 unique); struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique); +void __fuse_put_request(struct fuse_req *req); + void fuse_dev_end_requests(struct list_head *head); void fuse_copy_init(struct fuse_copy_state *cs, int write, diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index dcc1c327a057..29a7a6e57577 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -408,6 +408,7 @@ enum fuse_req_flag { FR_FINISHED, FR_PRIVATE, FR_ASYNC, + FR_URING, }; /** @@ -457,6 +458,7 @@ struct fuse_req { #ifdef CONFIG_FUSE_IO_URING void *ring_entry; + void *ring_queue; #endif /** When (in jiffies) the request was created */ unsigned long create_time;