Re: [PATCH] fuse: clear FR_PENDING without holding fiq lock for uring requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/4/25 21:38, Bernd Schubert wrote:
> 
> 
> On 2/4/25 21:29, Joanne Koong wrote:
>> On Tue, Feb 4, 2025 at 12:00 PM Bernd Schubert
>> <bernd.schubert@xxxxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 2/4/25 20:26, Joanne Koong wrote:
>>>> Hi Bernd,
>>>>
>>>> On Tue, Feb 4, 2025 at 3:03 AM Bernd Schubert
>>>> <bernd.schubert@xxxxxxxxxxx> wrote:
>>>>>
>>>>> Hi Joanne,
>>>>>
>>>>> On 2/3/25 19:50, Joanne Koong wrote:
>>>>>> req->flags is set/tested/cleared atomically in fuse. When the FR_PENDING
>>>>>> bit is cleared from the request flags when assigning a request to a
>>>>>> uring entry, the fiq->lock does not need to be held.
>>>>>>
>>>>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
>>>>>> Fixes: c090c8abae4b6 ("fuse: Add io-uring sqe commit and fetch support")
>>>>>> ---
>>>>>>  fs/fuse/dev_uring.c | 2 --
>>>>>>  1 file changed, 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>>>> index ab8c26042aa8..42389d3e7235 100644
>>>>>> --- a/fs/fuse/dev_uring.c
>>>>>> +++ b/fs/fuse/dev_uring.c
>>>>>> @@ -764,9 +764,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
>>>>>>                       ent->state);
>>>>>>       }
>>>>>>
>>>>>> -     spin_lock(&fiq->lock);
>>>>>>       clear_bit(FR_PENDING, &req->flags);
>>>>>> -     spin_unlock(&fiq->lock);
>>>>>>       ent->fuse_req = req;
>>>>>>       ent->state = FRRS_FUSE_REQ;
>>>>>>       list_move(&ent->list, &queue->ent_w_req_queue);
>>>>>
>>>>> I think that would have an issue in request_wait_answer(). Let's say
>>>>>
>>>>>
>>>>> task-A, request_wait_answer(),
>>>>>                 spin_lock(&fiq->lock);
>>>>>                 /* Request is not yet in userspace, bail out */
>>>>>                 if (test_bit(FR_PENDING, &req->flags)) {  // ========> if passed
>>>>>                         list_del(&req->list);  // --> removes from the list
>>>>>
>>>>> task-B,
>>>>> fuse_uring_add_req_to_ring_ent()
>>>>>         clear_bit(FR_PENDING, &req->flags);
>>>>>         ent->fuse_req = req;
>>>>>         ent->state = FRRS_FUSE_REQ;
>>>>>         list_move_tail(&ent->list, &queue->ent_w_req_queue);
>>>>>         fuse_uring_add_to_pq(ent, req);  // ==> Add to list
>>>>>
>>>>>
>>>>>
>>>>> What I mean is, task-A passes the if, but is then slower than task-B. I.e.
>>>>> task-B runs fuse_uring_add_to_pq() before task-B does the list_del.
>>>>>
>>>>
>>>> Is this race condition possible given that fiq->ops->send_req() is
>>>> called (and completed) before request_wait_answer() is called? The
>>>> path I see is this:
>>>>
>>>> __fuse_simple_request()
>>>>     __fuse_request_send()
>>>>         fuse_send_one()
>>>>             fiq->ops->send_req()
>>>>                   fuse_uring_queue_fuse_req()
>>>>                       fuse_uring_add_req_to_ring_ent()
>>>>                            clear FR_PENDING bit
>>>>                            fuse_uring_add_to_pq()
>>>>         request_wait_answer()
>>>>
>>>> It doesn't seem like task A can call request_wait_answer() while task
>>>> B is running fuse_uring_queue_fuse_req() on the same request while the
>>>> request still has the FR_PENDING bit set.
>>>>
>>>> This case of task A running request_wait_answer() while task B is
>>>> executing fuse_uring_add_req_to_ring_ent() can happen through
>>>> fuse_uring_commit_fetch() ->  fuse_uring_add_req_to_ring_ent(), but at
>>>> that point the FR_PENDING flag will have already been cleared on the
>>>> request, so this would bypass the "if (test_bit(FR_PENDING,...))"
>>>> check in request_wait_answer().
>>>
>>> I mean this case. I don't think FR_PENDING is cleared - why should it?
>>> And where? The request is pending state, waiting to get into 'FR_SENT'?
>>>
>>>>
>>>> Is there something I'm missing? I think if this race condition is
>>>> possible, then we also have a bigger problem where the request can be
>>>> freed out in this request_wait_answer() ->  if (test_bit(FR_PENDING,
>>>> &req->flags))...  case while fuse_uring_add_req_to_ring_ent() ->
>>>> fuse_uring_add_to_pq() dereferences it still.
>>>
>>> I don't think so, if we take the lock.
>>>
>>
>> the path I'm looking at is this:
>>
>> task A -
>> __fuse_simple_request()
>>     fuse_get_req() -> request is allocated (req refcount is 1)
>>     __fuse_request_send()
>>         __fuse_get_request() -> req refcount is 2
>>         fuse_send_one() -> req gets sent to uring
>>         request_wait_answer()
>>                ...
>>                hits the interrupt case, goes into "if
>> test_bit(FR_PENDING, ...)" case which calls __fuse_put_request(), req
>> refcount is now 1
>>     fuse_put_request() -> req refcount is dropped to 0, request is freed
>>
>> while in task B -
>> fuse_uring_commit_fetch()
>>     fuse_uring_next_fuse_req()
>>         fuse_uring_ent_assign_req()
>>             gets req off fuse_req_queue
>>             fuse_uring_add_req_to_ring_ent()
>>                  clear FR_PENDING
>>                  fuse_uring_add_to_pq()
>>                      dereferences req
>>
>> if task A hits the interrupt case in request_wait_answer() and then
>> calls fuse_put_request() before task B clears the pending flag (and
>> after it's gotten the request from the fuse_req_queue in
>> fuse_uring_ent_assign_req()), then I think we hit this case, no?
>>
> 
> Oh no, yes, you are right. It is a bit ugly to use fiq lock for list
> handling. I think I'm going to add uring handler for that to
> request_wait_answer. In general, basically request_wait_answer
> is currently operating on the wrong list - it assumes fiq, but that
> is not where the request it waiting on.

Please see the attached patch, I need to think about a way to test this
and will send out properly tomorrow. So far it is only basically
compilation tested.


Thanks,
Bernd
fuse: {io-uring} Fix a possible req cancellation race

From: Bernd Schubert <bschubert@xxxxxxx>

task-A (application) might be in request_wait_answer and
try to remove the request when it has FR_PENDING set.

task-B (a fuse-server io-uring task) might handle this
request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when
fetching the next request and accessed the req from
the pending list in fuse_uring_ent_assign_req().
That code path was not protected by fiq->lock and so
might race with task-A.

For scaling reasons we better don't use fiq->lock, but
add a handler to remove canceled requests from the queue.

Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support")
Reported-by: Joanne Koong <joannelkoong@xxxxxxxxx>
Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@xxxxxxxxxxxxxx/
Signed-off-by: Bernd Schubert <bschubert@xxxxxxx>

--
Compilation tested only
---
 fs/fuse/dev.c         |   25 ++++++++++++++++---------
 fs/fuse/dev_uring.c   |   25 +++++++++++++++++++++----
 fs/fuse/dev_uring_i.h |    6 ++++++
 fs/fuse/fuse_dev_i.h  |    2 ++
 fs/fuse/fuse_i.h      |    2 ++
 5 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 80a11ef4b69a..0494ea47893a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -157,7 +157,7 @@ static void __fuse_get_request(struct fuse_req *req)
 }
 
 /* Must be called with > 1 refcount */
-static void __fuse_put_request(struct fuse_req *req)
+void __fuse_put_request(struct fuse_req *req)
 {
 	refcount_dec(&req->count);
 }
@@ -529,16 +529,23 @@ static void request_wait_answer(struct fuse_req *req)
 		if (!err)
 			return;
 
-		spin_lock(&fiq->lock);
-		/* Request is not yet in userspace, bail out */
-		if (test_bit(FR_PENDING, &req->flags)) {
-			list_del(&req->list);
+		if (test_bit(FR_URING, &req->flags)) {
+			bool removed = fuse_uring_remove_pending_req(req);
+
+			if (removed)
+				return;
+		} else {
+			spin_lock(&fiq->lock);
+			/* Request is not yet in userspace, bail out */
+			if (test_bit(FR_PENDING, &req->flags)) {
+				list_del(&req->list);
+				spin_unlock(&fiq->lock);
+				__fuse_put_request(req);
+				req->out.h.error = -EINTR;
+				return;
+			}
 			spin_unlock(&fiq->lock);
-			__fuse_put_request(req);
-			req->out.h.error = -EINTR;
-			return;
 		}
-		spin_unlock(&fiq->lock);
 	}
 
 	/*
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 1e2bceb4ff1e..f9abdcf5f7e6 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -771,8 +771,6 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 					   struct fuse_req *req)
 {
 	struct fuse_ring_queue *queue = ent->queue;
-	struct fuse_conn *fc = req->fm->fc;
-	struct fuse_iqueue *fiq = &fc->iq;
 
 	lockdep_assert_held(&queue->lock);
 
@@ -782,9 +780,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 			ent->state);
 	}
 
-	spin_lock(&fiq->lock);
 	clear_bit(FR_PENDING, &req->flags);
-	spin_unlock(&fiq->lock);
 	ent->fuse_req = req;
 	ent->state = FRRS_FUSE_REQ;
 	list_move_tail(&ent->list, &queue->ent_w_req_queue);
@@ -1285,6 +1281,8 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 	if (unlikely(queue->stopped))
 		goto err_unlock;
 
+	set_bit(FR_URING, &req->flags);
+	req->ring_queue = queue;
 	ent = list_first_entry_or_null(&queue->ent_avail_queue,
 				       struct fuse_ring_ent, list);
 	if (ent)
@@ -1323,6 +1321,8 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 		return false;
 	}
 
+	set_bit(FR_URING, &req->flags);
+	req->ring_queue = queue;
 	list_add_tail(&req->list, &queue->fuse_req_bg_queue);
 
 	ent = list_first_entry_or_null(&queue->ent_avail_queue,
@@ -1353,6 +1353,23 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 	return true;
 }
 
+bool fuse_uring_remove_pending_req(struct fuse_req *req)
+{
+	struct fuse_ring_queue *queue = req->ring_queue;
+
+	spin_lock(&queue->lock);
+	if (test_bit(FR_PENDING, &req->flags)) {
+		list_del(&req->list);
+		spin_unlock(&queue->lock);
+		__fuse_put_request(req);
+		req->out.h.error = -EINTR;
+		return true;
+	}
+	spin_unlock(&queue->lock);
+
+	return false;
+}
+
 static const struct fuse_iqueue_ops fuse_io_uring_ops = {
 	/* should be send over io-uring as enhancement */
 	.send_forget = fuse_dev_queue_forget,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index a37991d17d34..86071758628f 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -143,6 +143,7 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
 bool fuse_uring_queue_bq_req(struct fuse_req *req);
 bool fuse_uring_request_expired(struct fuse_conn *fc);
+bool fuse_uring_remove_pending_req(struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
@@ -206,6 +207,11 @@ static inline bool fuse_uring_request_expired(struct fuse_conn *fc)
 	return false;
 }
 
+static inline bool fuse_uring_remove_pending_req(struct fuse_req *req)
+{
+	return false;
+}
+
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 19c29c6000a7..36b9092061ea 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -49,6 +49,8 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 unsigned int fuse_req_hash(u64 unique);
 struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
 
+void __fuse_put_request(struct fuse_req *req);
+
 void fuse_dev_end_requests(struct list_head *head);
 
 void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index dcc1c327a057..29a7a6e57577 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -408,6 +408,7 @@ enum fuse_req_flag {
 	FR_FINISHED,
 	FR_PRIVATE,
 	FR_ASYNC,
+	FR_URING,
 };
 
 /**
@@ -457,6 +458,7 @@ struct fuse_req {
 
 #ifdef CONFIG_FUSE_IO_URING
 	void *ring_entry;
+	void *ring_queue;
 #endif
 	/** When (in jiffies) the request was created */
 	unsigned long create_time;

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux