Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Sun, 4 Aug 2024 21:45:32 -0700

On Sun, Aug 4, 2024 at 3:46 PM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 7/30/24 02:23, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or take
> > too long to reply to a request. Currently there is no upper bound on
> > how long a request may take, which may be frustrating to users who get
> > stuck waiting for a request to complete.
> >
> > This commit adds a timeout option (in seconds) for requests. If the
> > timeout elapses before the server replies to the request, the request
> > will fail with -ETIME.
> >
> > There are 3 possibilities for a request that times out:
> > a) The request times out before the request has been sent to userspace
> > b) The request times out after the request has been sent to userspace
> > and before it receives a reply from the server
> > c) The request times out after the request has been sent to userspace
> > and the server replies while the kernel is timing out the request
> >
> > While a request timeout is being handled, there may be other handlers
> > running at the same time if:
> > a) the kernel is forwarding the request to the server
> > b) the kernel is processing the server's reply to the request
> > c) the request is being re-sent
> > d) the connection is aborting
> > e) the device is getting released
> >
> > Proper synchronization must be added to ensure that the request is
> > handled correctly in all of these cases. To this effect, there is a new
> > FR_FINISHING bit added to the request flags, which is set atomically by
> > either the timeout handler (see fuse_request_timeout()) which is invoked
> > after the request timeout elapses or set by the request reply handler
> > (see dev_do_write()), whichever gets there first. If the reply handler
> > and the timeout handler are executing simultaneously and the reply handler
> > sets FR_FINISHING before the timeout handler, then the request will be
> > handled as if the timeout did not elapse. If the timeout handler sets
> > FR_FINISHING before the reply handler, then the request will fail with
> > -ETIME and the request will be cleaned up.
> >
> > Currently, this is the refcount lifecycle of a request:
> >
> > Synchronous request is created:
> > fuse_simple_request -> allocates request, sets refcount to 1
> >   __fuse_request_send -> acquires refcount
> >     queues request and waits for reply...
> > fuse_simple_request -> drops refcount
> >
> > Background request is created:
> > fuse_simple_background -> allocates request, sets refcount to 1
> >
> > Request is replied to:
> > fuse_dev_do_write
> >   fuse_request_end -> drops refcount on request
> >
> > Proper acquires on the request reference must be added to ensure that the
> > timeout handler does not drop the last refcount on the request while
> > other handlers may be operating on the request. Please note that the
> > timeout handler may get invoked at any phase of the request's
> > lifetime (eg before the request has been forwarded to userspace, etc).
> >
> > It is always guaranteed that there is a refcount on the request when the
> > timeout handler is executing. The timeout handler will be either
> > deactivated by the reply/abort/release handlers, or if the timeout
> > handler is concurrently executing on another CPU, the reply/abort/release
> > handlers will wait for the timeout handler to finish executing first before
> > it drops the final refcount on the request.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > ---
> >  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/fuse/fuse_i.h |  14 ++++
> >  fs/fuse/inode.c  |   7 ++
> >  3 files changed, 200 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 9eb191b5c4de..9992bc5f4469 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
> >
> >  static struct kmem_cache *fuse_req_cachep;
> >
> > +static void fuse_request_timeout(struct timer_list *timer);
> > +
> >  static struct fuse_dev *fuse_get_dev(struct file *file)
> >  {
> >       /*
> > @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> >       refcount_set(&req->count, 1);
> >       __set_bit(FR_PENDING, &req->flags);
> >       req->fm = fm;
> > +     if (fm->fc->req_timeout)
> > +             timer_setup(&req->timer, fuse_request_timeout, 0);
> >  }
> >
> >  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> > @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
> >   * the 'end' callback is called if given, else the reference to the
> >   * request is released
> >   */
> > -void fuse_request_end(struct fuse_req *req)
> > +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
> >  {
> >       struct fuse_mount *fm = req->fm;
> >       struct fuse_conn *fc = fm->fc;
> >       struct fuse_iqueue *fiq = &fc->iq;
> >
> > +     if (from_timer_callback)
> > +             req->out.h.error = -ETIME;
> > +
> >       if (test_and_set_bit(FR_FINISHED, &req->flags))
> >               goto put_request;
> >
> > @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
> >               list_del_init(&req->intr_entry);
> >               spin_unlock(&fiq->lock);
> >       }
> > -     WARN_ON(test_bit(FR_PENDING, &req->flags));
> > -     WARN_ON(test_bit(FR_SENT, &req->flags));
> >       if (test_bit(FR_BACKGROUND, &req->flags)) {
> >               spin_lock(&fc->bg_lock);
> >               clear_bit(FR_BACKGROUND, &req->flags);
> > @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
> >               wake_up(&req->waitq);
> >       }
> >
> > +     if (!from_timer_callback && req->timer.function)
> > +             timer_delete_sync(&req->timer);
> > +
> >       if (test_bit(FR_ASYNC, &req->flags))
> >               req->args->end(fm, req->args, req->out.h.error);
> >  put_request:
> >       fuse_put_request(req);
> >  }
> > +
> > +void fuse_request_end(struct fuse_req *req)
> > +{
> > +     WARN_ON(test_bit(FR_PENDING, &req->flags));
> > +     WARN_ON(test_bit(FR_SENT, &req->flags));
> > +
> > +     do_fuse_request_end(req, false);
> > +}
> >  EXPORT_SYMBOL_GPL(fuse_request_end);
> >
> > +static void timeout_inflight_req(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     struct fuse_pqueue *fpq;
> > +
> > +     spin_lock(&fiq->lock);
> > +     fpq = req->fpq;
> > +     spin_unlock(&fiq->lock);
> > +
> > +     /*
> > +      * If fpq has not been set yet, then the request is aborting (which
> > +      * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> > +      * has been called. Let the abort handler handle this request.
> > +      */
> > +     if (!fpq)
> > +             return;
> > +
> > +     spin_lock(&fpq->lock);
> > +     if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> > +             /*
> > +              * Connection is being aborted or the fuse_dev is being released.
> > +              * The abort / release will clean up the request
> > +              */
> > +             spin_unlock(&fpq->lock);
> > +             return;
> > +     }
> > +
> > +     if (!test_bit(FR_PRIVATE, &req->flags))
> > +             list_del_init(&req->list);
> > +
> > +     spin_unlock(&fpq->lock);
> > +
> > +     do_fuse_request_end(req, true);
> > +}
> > +
> > +static void timeout_pending_req(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     bool background = test_bit(FR_BACKGROUND, &req->flags);
> > +
> > +     if (background)
> > +             spin_lock(&fc->bg_lock);
> > +     spin_lock(&fiq->lock);
> > +
> > +     if (!test_bit(FR_PENDING, &req->flags)) {
> > +             spin_unlock(&fiq->lock);
> > +             if (background)
> > +                     spin_unlock(&fc->bg_lock);
> > +             timeout_inflight_req(req);
> > +             return;
> > +     }
> > +
> > +     if (!test_bit(FR_PRIVATE, &req->flags))
> > +             list_del_init(&req->list);
> > +
> > +     spin_unlock(&fiq->lock);
> > +     if (background)
> > +             spin_unlock(&fc->bg_lock);
> > +
> > +     do_fuse_request_end(req, true);
> > +}
> > +
> > +static void fuse_request_timeout(struct timer_list *timer)
> > +{
> > +     struct fuse_req *req = container_of(timer, struct fuse_req, timer);
>
> Let's say the timeout thread races with the thread that does
> fuse_dev_do_write() and that thread is much faster and already calls :
>
> fuse_dev_do_write():
>         fuse_request_end(req);
>         fuse_put_request(req);
> out:
>         return err ? err : nbytes;
>
>
> (What I mean is that the timeout triggered, but did not reach
> FR_FINISHING yet and at the same time another thread on another core
> calls fuse_dev_do_write()).
>
> > +
> > +     /*
> > +      * Request reply is being finished by the kernel right now.
> > +      * No need to time out the request.
> > +      */
> > +     if (test_and_set_bit(FR_FINISHING, &req->flags))
> > +             return;
>
> Wouldn't that trigger an UAF when the fuse_dev_do_write() was proceding
> much faster and already released the request?

I don't believe so. In fuse_dev_do_write(), the call to
fuse_request_end() will call timer_delete_sync(), which will either
cancel the timer or wait for the timer to finish running if it's
concurrently running on another CPU.
>
> > +
> > +     if (test_bit(FR_PENDING, &req->flags))
> > +             timeout_pending_req(req);
> > +     else
> > +             timeout_inflight_req(req);
> > +}
> > +
> >  static int queue_interrupt(struct fuse_req *req)
> >  {
> >       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
> >
> >  static void __fuse_request_send(struct fuse_req *req)
> >  {
> > -     struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> >
> >       BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
> >       spin_lock(&fiq->lock);
> > @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
> >               /* acquire extra reference, since request is still needed
> >                  after fuse_request_end() */
> >               __fuse_get_request(req);
> > +             if (req->timer.function) {
> > +                     req->timer.expires = jiffies + fc->req_timeout;
> > +                     add_timer(&req->timer);
> > +             }
>
> Does this leave a chance to put in a timeout of 0, if someone first sets
>  fc->req_timeout and then sets it back to 0?

I don't think so. The req_timeout is per connection and specified at
mount time. Once the fc->req_timeout is set for the connection it
can't be changed even if the default_req_timeout sysctl gets set to 0.

>
>
> (I'm going to continue reviewing tomorrow, gets very late here).

Thanks for reviewing.
>
>
> Thanks,
> Bernd