Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 8/5/24 06:52, Joanne Koong wrote:
> > On Mon, Jul 29, 2024 at 5:28 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >>
> >> There are situations where fuse servers can become unresponsive or take
> >> too long to reply to a request. Currently there is no upper bound on
> >> how long a request may take, which may be frustrating to users who get
> >> stuck waiting for a request to complete.
> >>
> >> This commit adds a timeout option (in seconds) for requests. If the
> >> timeout elapses before the server replies to the request, the request
> >> will fail with -ETIME.
> >>
> >> There are 3 possibilities for a request that times out:
> >> a) The request times out before the request has been sent to userspace
> >> b) The request times out after the request has been sent to userspace
> >> and before it receives a reply from the server
> >> c) The request times out after the request has been sent to userspace
> >> and the server replies while the kernel is timing out the request
> >>
> >> While a request timeout is being handled, there may be other handlers
> >> running at the same time if:
> >> a) the kernel is forwarding the request to the server
> >> b) the kernel is processing the server's reply to the request
> >> c) the request is being re-sent
> >> d) the connection is aborting
> >> e) the device is getting released
> >>
> >> Proper synchronization must be added to ensure that the request is
> >> handled correctly in all of these cases. To this effect, there is a new
> >> FR_FINISHING bit added to the request flags, which is set atomically by
> >> either the timeout handler (see fuse_request_timeout()) which is invoked
> >> after the request timeout elapses or set by the request reply handler
> >> (see dev_do_write()), whichever gets there first. If the reply handler
> >> and the timeout handler are executing simultaneously and the reply handler
> >> sets FR_FINISHING before the timeout handler, then the request will be
> >> handled as if the timeout did not elapse. If the timeout handler sets
> >> FR_FINISHING before the reply handler, then the request will fail with
> >> -ETIME and the request will be cleaned up.
> >>
> >> Currently, this is the refcount lifecycle of a request:
> >>
> >> Synchronous request is created:
> >> fuse_simple_request -> allocates request, sets refcount to 1
> >>    __fuse_request_send -> acquires refcount
> >>      queues request and waits for reply...
> >> fuse_simple_request -> drops refcount
> >>
> >> Background request is created:
> >> fuse_simple_background -> allocates request, sets refcount to 1
> >>
> >> Request is replied to:
> >> fuse_dev_do_write
> >>    fuse_request_end -> drops refcount on request
> >>
> >> Proper acquires on the request reference must be added to ensure that the
> >> timeout handler does not drop the last refcount on the request while
> >> other handlers may be operating on the request. Please note that the
> >> timeout handler may get invoked at any phase of the request's
> >> lifetime (eg before the request has been forwarded to userspace, etc).
> >>
> >> It is always guaranteed that there is a refcount on the request when the
> >> timeout handler is executing. The timeout handler will be either
> >> deactivated by the reply/abort/release handlers, or if the timeout
> >> handler is concurrently executing on another CPU, the reply/abort/release
> >> handlers will wait for the timeout handler to finish executing first before
> >> it drops the final refcount on the request.
> >>
> >> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> >> ---
> >>   fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
> >>   fs/fuse/fuse_i.h |  14 ++++
> >>   fs/fuse/inode.c  |   7 ++
> >>   3 files changed, 200 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> >> index 9eb191b5c4de..9992bc5f4469 100644
> >> --- a/fs/fuse/dev.c
> >> +++ b/fs/fuse/dev.c
> >> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
> >>
> >>   static struct kmem_cache *fuse_req_cachep;
> >>
> >> +static void fuse_request_timeout(struct timer_list *timer);
> >> +
> >>   static struct fuse_dev *fuse_get_dev(struct file *file)
> >>   {
> >>          /*
> >> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> >>          refcount_set(&req->count, 1);
> >>          __set_bit(FR_PENDING, &req->flags);
> >>          req->fm = fm;
> >> +       if (fm->fc->req_timeout)
> >> +               timer_setup(&req->timer, fuse_request_timeout, 0);
> >>   }
> >>
> >>   static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> >> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
> >>    * the 'end' callback is called if given, else the reference to the
> >>    * request is released
> >>    */
> >> -void fuse_request_end(struct fuse_req *req)
> >> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
> >>   {
> >>          struct fuse_mount *fm = req->fm;
> >>          struct fuse_conn *fc = fm->fc;
> >>          struct fuse_iqueue *fiq = &fc->iq;
> >>
> >> +       if (from_timer_callback)
> >> +               req->out.h.error = -ETIME;
> >> +
> >>          if (test_and_set_bit(FR_FINISHED, &req->flags))
> >>                  goto put_request;
> >>
> >> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
> >>                  list_del_init(&req->intr_entry);
> >>                  spin_unlock(&fiq->lock);
> >>          }
> >> -       WARN_ON(test_bit(FR_PENDING, &req->flags));
> >> -       WARN_ON(test_bit(FR_SENT, &req->flags));
> >>          if (test_bit(FR_BACKGROUND, &req->flags)) {
> >>                  spin_lock(&fc->bg_lock);
> >>                  clear_bit(FR_BACKGROUND, &req->flags);
> >> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
> >>                  wake_up(&req->waitq);
> >>          }
> >>
> >> +       if (!from_timer_callback && req->timer.function)
> >> +               timer_delete_sync(&req->timer);
> >> +
> >>          if (test_bit(FR_ASYNC, &req->flags))
> >>                  req->args->end(fm, req->args, req->out.h.error);
> >>   put_request:
> >>          fuse_put_request(req);
> >>   }
> >> +
> >> +void fuse_request_end(struct fuse_req *req)
> >> +{
> >> +       WARN_ON(test_bit(FR_PENDING, &req->flags));
> >> +       WARN_ON(test_bit(FR_SENT, &req->flags));
> >> +
> >> +       do_fuse_request_end(req, false);
> >> +}
> >>   EXPORT_SYMBOL_GPL(fuse_request_end);
> >>
> >> +static void timeout_inflight_req(struct fuse_req *req)
> >> +{
> >> +       struct fuse_conn *fc = req->fm->fc;
> >> +       struct fuse_iqueue *fiq = &fc->iq;
> >> +       struct fuse_pqueue *fpq;
> >> +
> >> +       spin_lock(&fiq->lock);
> >> +       fpq = req->fpq;
> >> +       spin_unlock(&fiq->lock);
> >> +
> >> +       /*
> >> +        * If fpq has not been set yet, then the request is aborting (which
> >> +        * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> >> +        * has been called. Let the abort handler handle this request.
> >> +        */
> >> +       if (!fpq)
> >> +               return;
> >> +
> >> +       spin_lock(&fpq->lock);
> >> +       if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> >> +               /*
> >> +                * Connection is being aborted or the fuse_dev is being released.
> >> +                * The abort / release will clean up the request
> >> +                */
> >> +               spin_unlock(&fpq->lock);
> >> +               return;
> >> +       }
> >> +
> >> +       if (!test_bit(FR_PRIVATE, &req->flags))
> >> +               list_del_init(&req->list);
> >> +
> >> +       spin_unlock(&fpq->lock);
> >> +
> >> +       do_fuse_request_end(req, true);
> >> +}
> >> +
> >> +static void timeout_pending_req(struct fuse_req *req)
> >> +{
> >> +       struct fuse_conn *fc = req->fm->fc;
> >> +       struct fuse_iqueue *fiq = &fc->iq;
> >> +       bool background = test_bit(FR_BACKGROUND, &req->flags);
> >> +
> >> +       if (background)
> >> +               spin_lock(&fc->bg_lock);
> >> +       spin_lock(&fiq->lock);
> >> +
> >> +       if (!test_bit(FR_PENDING, &req->flags)) {
> >> +               spin_unlock(&fiq->lock);
> >> +               if (background)
> >> +                       spin_unlock(&fc->bg_lock);
> >> +               timeout_inflight_req(req);
> >> +               return;
> >> +       }
> >> +
> >> +       if (!test_bit(FR_PRIVATE, &req->flags))
> >> +               list_del_init(&req->list);
> >> +
> >> +       spin_unlock(&fiq->lock);
> >> +       if (background)
> >> +               spin_unlock(&fc->bg_lock);
> >> +
> >> +       do_fuse_request_end(req, true);
> >> +}
> >> +
> >> +static void fuse_request_timeout(struct timer_list *timer)
> >> +{
> >> +       struct fuse_req *req = container_of(timer, struct fuse_req, timer);
> >> +
> >> +       /*
> >> +        * Request reply is being finished by the kernel right now.
> >> +        * No need to time out the request.
> >> +        */
> >> +       if (test_and_set_bit(FR_FINISHING, &req->flags))
> >> +               return;
> >> +
> >> +       if (test_bit(FR_PENDING, &req->flags))
> >> +               timeout_pending_req(req);
> >> +       else
> >> +               timeout_inflight_req(req);
> >> +}
> >> +
> >>   static int queue_interrupt(struct fuse_req *req)
> >>   {
> >>          struct fuse_iqueue *fiq = &req->fm->fc->iq;
> >> @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
> >>
> >>   static void __fuse_request_send(struct fuse_req *req)
> >>   {
> >> -       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> >> +       struct fuse_conn *fc = req->fm->fc;
> >> +       struct fuse_iqueue *fiq = &fc->iq;
> >>
> >>          BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
> >>          spin_lock(&fiq->lock);
> >> @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
> >>                  /* acquire extra reference, since request is still needed
> >>                     after fuse_request_end() */
> >>                  __fuse_get_request(req);
> >> +               if (req->timer.function) {
> >> +                       req->timer.expires = jiffies + fc->req_timeout;
> >> +                       add_timer(&req->timer);
> >> +               }
> >>                  queue_request_and_unlock(fiq, req);
> >>
> >>                  request_wait_answer(req);
> >> @@ -539,6 +641,10 @@ static bool fuse_request_queue_background(struct fuse_req *req)
> >>                  if (fc->num_background == fc->max_background)
> >>                          fc->blocked = 1;
> >>                  list_add_tail(&req->list, &fc->bg_queue);
> >> +               if (req->timer.function) {
> >> +                       req->timer.expires = jiffies + fc->req_timeout;
> >> +                       add_timer(&req->timer);
> >> +               }
> >>                  flush_bg_queue(fc);
> >>                  queued = true;
> >>          }
> >> @@ -1268,6 +1374,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >>          req = list_entry(fiq->pending.next, struct fuse_req, list);
> >>          clear_bit(FR_PENDING, &req->flags);
> >>          list_del_init(&req->list);
> >> +       /* Acquire a reference in case the timeout handler starts executing */
> >> +       __fuse_get_request(req);
> >> +       req->fpq = fpq;
> >>          spin_unlock(&fiq->lock);
> >>
> >>          args = req->args;
> >> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >>                  if (args->opcode == FUSE_SETXATTR)
> >>                          req->out.h.error = -E2BIG;
> >>                  fuse_request_end(req);
> >> +               fuse_put_request(req);
> >>                  goto restart;
> >
> > While rereading through fuse_dev_do_read, I just realized we also need
> > to handle the race condition for the error edge cases (here and in the
> > "goto out_end;"), since the timeout handler could have finished
> > executing by the time we hit the error edge case. We need to
> > test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
> > dev_do_read cleans up the request, but not both. I'll fix this for v3.
>
> I know it would change semantics a bit, but wouldn't it be much easier /
> less racy if fuse_dev_do_read() would delete the timer when it takes a
> request from fiq->pending and add it back in (with new timeouts) before
> it returns the request?
>

Ooo I really like this idea! I'm worried though that this might allow
potential scenarios where the fuse_dev_do_read gets descheduled after
disarming the timer and a non-trivial amount of time elapses before it
gets scheduled back (eg on a system where the CPU is starved), in
which case the fuse req_timeout value will be (somewhat of) a lie. If
you and others think this is likely fine though, then I'll incorporate
this into v3 which will make this logic a lot simpler :)


Thanks,
Joanne

> Untested:
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9992bc5f4469..444f667e2f43 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1379,6 +1379,15 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>          req->fpq = fpq;
>          spin_unlock(&fiq->lock);
>
> +       if (req->timer.function) {
> +               /* request gets handled, remove the previous timeout */
> +               timer_delete_sync(&req->timer);
> +               if (test_bit(FR_FINISHED, &req->flags)) {
> +                       fuse_put_request(req);
> +                       goto restart;
> +               }
> +       }
> +
>          args = req->args;
>          reqsize = req->in.h.len;
>
> @@ -1433,24 +1442,10 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>          if (test_bit(FR_INTERRUPTED, &req->flags))
>                  queue_interrupt(req);
>
> -       /*
> -        * Check if the timeout handler is running / ran. If it did, we need to
> -        * remove the request from any lists in case the timeout handler finished
> -        * before dev_do_read moved the request to the processing list.
> -        *
> -        * Check FR_SENT to distinguish whether the timeout or the write handler
> -        * is finishing the request. However, there can be the case where the
> -        * timeout handler and resend handler are running concurrently, so we
> -        * need to also check the FR_PENDING bit.
> -        */
> -       if (test_bit(FR_FINISHING, &req->flags) &&
> -           (test_bit(FR_SENT, &req->flags) || test_bit(FR_PENDING, &req->flags))) {
> -               spin_lock(&fpq->lock);
> -               if (!test_bit(FR_PRIVATE, &req->flags))
> -                       list_del_init(&req->list);
> -               spin_unlock(&fpq->lock);
> -               fuse_put_request(req);
> -               return -ETIME;
> +       if (req->timer.function) {
> +               /* re-arm the request */
> +               req->timer.expires = jiffies + fc->req_timeout;
> +               add_timer(&req->timer);
>          }
>
>          fuse_put_request(req);
>
> Thanks,
> Bernd





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux