Hi Bernd, On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > Hi Joanne, > > On 2/3/25 19:50, Joanne Koong wrote: > > req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING > > bit is cleared from the request flags when assigning a request to a > > uring entry, the fiq->lock does not need to be held. > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support") > > --- > > fs/fuse/dev_uring.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c > > index ab8c26042aa8..42389d3e7235 100644 > > --- a/fs/fuse/dev_uring.c > > +++ b/fs/fuse/dev_uring.c > > @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent, > > ent->state); > > } > > > > - spin_lock(&fiq->lock); > > clear_bit(FR_PENDING, &req->flags); > > - spin_unlock(&fiq->lock); > > ent->fuse_req = req; > > ent->state = FRRS_FUSE_REQ; > > list_move(&ent->list, &queue->ent_w_req_queue); > > I think that would have an issue in request_wait_answer(). Let's say > > > task-A, request_wait_answer(), > spin_lock(&fiq->lock); > /* Request is not yet in userspace, bail out */ > if (test_bit(FR_PENDING, &req->flags)) { // ========> if passed > list_del(&req->list); // --> removes from the list > > task-B, > fuse_uring_add_req_to_ring_ent() > clear_bit(FR_PENDING, &req->flags); > ent->fuse_req = req; > ent->state = FRRS_FUSE_REQ; > list_move_tail(&ent->list, &queue->ent_w_req_queue); > fuse_uring_add_to_pq(ent, req); // ==> Add to list > > > > What I mean is, task-A passes the if, but is then slower than task-B. I.e. > task-B runs fuse_uring_add_to_pq() before task-B does the list_del. > Is this race condition possible given that fiq->ops->send_req() is called (and completed) before request_wait_answer() is called? The path I see is this: __fuse_simple_request() __fuse_request_send() fuse_send_one() fiq->ops->send_req() fuse_uring_queue_fuse_req() fuse_uring_add_req_to_ring_ent() clear FR_PENDING bit fuse_uring_add_to_pq() request_wait_answer() It doesn't seem like task A can call request_wait_answer() while task B is running fuse_uring_queue_fuse_req() on the same request while the request still has the FR_PENDING bit set. This case of task A running request_wait_answer() while task B is executing fuse_uring_add_req_to_ring_ent() can happen through fuse_uring_commit_fetch() -> fuse_uring_add_req_to_ring_ent(), but at that point the FR_PENDING flag will have already been cleared on the request, so this would bypass the "if (test_bit(FR_PENDING,...))" check in request_wait_answer(). Is there something I'm missing? I think if this race condition is possible, then we also have a bigger problem where the request can be freed out in this request_wait_answer() -> if (test_bit(FR_PENDING, &req->flags))... case while fuse_uring_add_req_to_ring_ent() -> fuse_uring_add_to_pq() dereferences it still. Thanks, Joanne > > Now the ring entry gets handled by fuse-server, comes back to fuse-client > and does not find the request anymore, because both tasks raced. > The entire ring entry will be lost - it not be used anymore for > the live time of the connection. > > And the other issue might be total list corruption, because two tasks might > access the list at the same time. > > > Thanks, > Bernd >