On Mon, Aug 5, 2024 at 7:45 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > > > > On 8/6/24 6:53 AM, Joanne Koong wrote: > > On Mon, Aug 5, 2024 at 12:32 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > >> > >> > >> > >> On 7/30/24 8:23 AM, Joanne Koong wrote: > >>> There are situations where fuse servers can become unresponsive or take > >>> too long to reply to a request. Currently there is no upper bound on > >>> how long a request may take, which may be frustrating to users who get > >>> stuck waiting for a request to complete. > >>> > >>> This commit adds a timeout option (in seconds) for requests. If the > >>> timeout elapses before the server replies to the request, the request > >>> will fail with -ETIME. > >>> > >>> There are 3 possibilities for a request that times out: > >>> a) The request times out before the request has been sent to userspace > >>> b) The request times out after the request has been sent to userspace > >>> and before it receives a reply from the server > >>> c) The request times out after the request has been sent to userspace > >>> and the server replies while the kernel is timing out the request > >>> > >>> While a request timeout is being handled, there may be other handlers > >>> running at the same time if: > >>> a) the kernel is forwarding the request to the server > >>> b) the kernel is processing the server's reply to the request > >>> c) the request is being re-sent > >>> d) the connection is aborting > >>> e) the device is getting released > >>> > >>> Proper synchronization must be added to ensure that the request is > >>> handled correctly in all of these cases. To this effect, there is a new > >>> FR_FINISHING bit added to the request flags, which is set atomically by > >>> either the timeout handler (see fuse_request_timeout()) which is invoked > >>> after the request timeout elapses or set by the request reply handler > >>> (see dev_do_write()), whichever gets there first. If the reply handler > >>> and the timeout handler are executing simultaneously and the reply handler > >>> sets FR_FINISHING before the timeout handler, then the request will be > >>> handled as if the timeout did not elapse. If the timeout handler sets > >>> FR_FINISHING before the reply handler, then the request will fail with > >>> -ETIME and the request will be cleaned up. > >>> > >>> Currently, this is the refcount lifecycle of a request: > >>> > >>> Synchronous request is created: > >>> fuse_simple_request -> allocates request, sets refcount to 1 > >>> __fuse_request_send -> acquires refcount > >>> queues request and waits for reply... > >>> fuse_simple_request -> drops refcount > >>> > >>> Background request is created: > >>> fuse_simple_background -> allocates request, sets refcount to 1 > >>> > >>> Request is replied to: > >>> fuse_dev_do_write > >>> fuse_request_end -> drops refcount on request > >>> > >>> Proper acquires on the request reference must be added to ensure that the > >>> timeout handler does not drop the last refcount on the request while > >>> other handlers may be operating on the request. Please note that the > >>> timeout handler may get invoked at any phase of the request's > >>> lifetime (eg before the request has been forwarded to userspace, etc). > >>> > >>> It is always guaranteed that there is a refcount on the request when the > >>> timeout handler is executing. The timeout handler will be either > >>> deactivated by the reply/abort/release handlers, or if the timeout > >>> handler is concurrently executing on another CPU, the reply/abort/release > >>> handlers will wait for the timeout handler to finish executing first before > >>> it drops the final refcount on the request. > >>> > >>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > >>> --- > >>> fs/fuse/dev.c | 187 +++++++++++++++++++++++++++++++++++++++++++++-- > >>> fs/fuse/fuse_i.h | 14 ++++ > >>> fs/fuse/inode.c | 7 ++ > >>> 3 files changed, 200 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > >>> index 9eb191b5c4de..9992bc5f4469 100644 > >>> --- a/fs/fuse/dev.c > >>> +++ b/fs/fuse/dev.c > >>> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse"); > >>> > >>> static struct kmem_cache *fuse_req_cachep; > >>> > >>> +static void fuse_request_timeout(struct timer_list *timer); > >>> + > >>> static struct fuse_dev *fuse_get_dev(struct file *file) > >>> { > >>> /* > >>> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req) > >>> refcount_set(&req->count, 1); > >>> __set_bit(FR_PENDING, &req->flags); > >>> req->fm = fm; > >>> + if (fm->fc->req_timeout) > >>> + timer_setup(&req->timer, fuse_request_timeout, 0); > >>> } > >>> > >>> static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags) > >>> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc) > >>> * the 'end' callback is called if given, else the reference to the > >>> * request is released > >>> */ > >>> -void fuse_request_end(struct fuse_req *req) > >>> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback) > >>> { > >>> struct fuse_mount *fm = req->fm; > >>> struct fuse_conn *fc = fm->fc; > >>> struct fuse_iqueue *fiq = &fc->iq; > >>> > >>> + if (from_timer_callback) > >>> + req->out.h.error = -ETIME; > >>> + > >> > >> FMHO, could we move the above error assignment up to the caller to make > >> do_fuse_request_end() look cleaner? > > > > Sure, I was thinking that it looks cleaner setting this in > > do_fuse_request_end() instead of having to set it in both > > timeout_pending_req() and timeout_inflight_req(), but I see your point > > as well. > > I'll make this change in v3. > > > >> > >> > >>> if (test_and_set_bit(FR_FINISHED, &req->flags)) > >>> goto put_request; > >>> > >>> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req) > >>> list_del_init(&req->intr_entry); > >>> spin_unlock(&fiq->lock); > >>> } > >>> - WARN_ON(test_bit(FR_PENDING, &req->flags)); > >>> - WARN_ON(test_bit(FR_SENT, &req->flags)); > >>> if (test_bit(FR_BACKGROUND, &req->flags)) { > >>> spin_lock(&fc->bg_lock); > >>> clear_bit(FR_BACKGROUND, &req->flags); > >>> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req) > >>> wake_up(&req->waitq); > >>> } > >>> > >>> + if (!from_timer_callback && req->timer.function) > >>> + timer_delete_sync(&req->timer); > >>> + > >> > >> Similarly, move the caller i.e. fuse_request_end() call > >> timer_delete_sync() instead? > > > > I don't think we can do that because the fuse_put_request() at the end > > of this function often holds the last refcount on the request which > > frees the request when it releases the ref. > > Initially I mean timer_delete_sync() could be called before > do_fuse_request_end() inside fuse_request_end(). But anyway it's a > rough idea just for making the code look cleaner, without thinking if > this logic change is right or not. Ahh I see now what you were saying. Great suggestion! I'll change this for v3. > > > >>> +static void timeout_pending_req(struct fuse_req *req) > >>> +{ > >>> + struct fuse_conn *fc = req->fm->fc; > >>> + struct fuse_iqueue *fiq = &fc->iq; > >>> + bool background = test_bit(FR_BACKGROUND, &req->flags); > >>> + > >>> + if (background) > >>> + spin_lock(&fc->bg_lock); > >> > >> Just out of curious, why fc->bg_lock is needed here, which makes the > >> code look less clean? > > > > The fc->bg_lock is needed because the background request may still be > > on the fc->bg_queue when the request times out (eg the request hasn't > > been flushed yet). We need to acquire the fc->bg_lock so that we can > > delete it from the queue, in case somehting else is modifying the > > queue at the same time. > > I can understand now. Thanks! > > > > >> > >> > >>> + spin_lock(&fiq->lock); > >>> + > >>> + if (!test_bit(FR_PENDING, &req->flags)) { > >>> + spin_unlock(&fiq->lock); > >>> + if (background) > >>> + spin_unlock(&fc->bg_lock); > >>> + timeout_inflight_req(req); > >>> + return; > >>> + } > >>> + > >>> + if (!test_bit(FR_PRIVATE, &req->flags)) > >>> + list_del_init(&req->list); > >>> + > >>> + spin_unlock(&fiq->lock); > >>> + if (background) > >>> + spin_unlock(&fc->bg_lock); > >>> + > >>> + do_fuse_request_end(req, true); > >> > >> I'm not sure if special handling for requests in fpq->io list in needed > >> here. At least when connection is aborted, thos LOCKED requests in > >> fpq->io list won't be finished instantly until they are unlocked. > >> > > > > The places where FR_LOCKED gets set on the request are in > > fuse_dev_do_write and fuse_dev_do_read when we do some of the page > > copying stuff. In both those functions, this timeout_pending_req() > > path isn't hit while we have the lock obtained - in fuse_dev_do_write, > > we test and set FR_FINISHING first before doing the page copying (the > > timeout handler will return before calling timeout_pending_req()), and > > in fuse_dev_do_read, the locking is called after the FR_PENDING flag > > has been cleared. > > > > I think there is a possibility that the timeout handler executes > > timeout_inflight_req() while the lock is obtained in fuse_dev_do_read > > during the page copying, but this patch added an extra > > __fuse_get_request() on the request before doing the page copying, > > which means the timeout handler will not free out the request while > > the lock is held and the page copying is being done. > > > > Yes, this is the only possible place where the timeout handler could > concurrently run while the request is in copying state (i.e. LOCKED). > As I described above, when connection is aborted, the LOCKED requests > will be left there and won't be finished until they are unlocked. I'm > not sure why this special handling is needed for LOCKED requests, but I > guess it's not because of UAF issue. From the comment of > lock_request(), "Up to the next unlock_request() there mustn't be > anything that could cause a page-fault.", though I can't understand in > which case there will be a page fault and it will be an issue. It's not clear to me either what in the end_requests() logic that abort_conn would call could cause a page fault. It would be great to get some light on this. For v3 I'm going to integrate Bernd's suggestion and disarm the timer before dev_do_read's main logic and rearm it after, which will also obviate this possible scenario of the timeout handler executing while the request is in locked copying state. Thanks, Joanne > > Maybe I'm wrong. It would be helpful if someone could shed light on this. > > > -- > Thanks, > Jingbo