Re: [PATCH] fuse: add optional kernel-enforced timeout for fuse requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Thu, 18 Jul 2024 09:58:48 -0700

On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
...
> > ---
> >  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
> >  fs/fuse/fuse_i.h |  12 ++++
> >  fs/fuse/inode.c  |   7 ++
> >  3 files changed, 188 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 9eb191b5c4de..7dd7b244951b 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
> >  }
> >  EXPORT_SYMBOL_GPL(fuse_request_end);
> >
> > +/* fuse_request_end for requests that timeout */
> > +static void fuse_request_timeout(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     struct fuse_pqueue *fpq;
> > +
> > +     spin_lock(&fiq->lock);
> > +     if (!fiq->connected) {
> > +             spin_unlock(&fiq->lock);
> > +             /*
> > +              * Connection is being aborted. The abort will release
> > +              * the refcount on the request
> > +              */
> > +             req->out.h.error = -ECONNABORTED;
> > +             return;
> > +     }
> > +     if (test_bit(FR_PENDING, &req->flags)) {
> > +             /* Request is not yet in userspace, bail out */
> > +             list_del(&req->list);
> > +             spin_unlock(&fiq->lock);
> > +             req->out.h.error = -ETIME;
> > +             __fuse_put_request(req);
>
> Why is this safe?  We could be the last holder of the reference on this request
> correct?  The only places using __fuse_put_request() would be where we are in a
> path where the caller already holds a reference on the request.  Since this is
> async it may not be the case right?
>
> If it is safe then it's just confusing and warrants a comment.
>

There is always a refcount still held on the request by
fuse_simple_request() when this is called. I'll add a comment about
this.
I also just noticed that I use fuse_put_request()  at the end of this
function, I'll change that to __fuse_put_request() as well.

> > +             return;
> > +     }
> > +     if (test_bit(FR_INTERRUPTED, &req->flags))
> > +             list_del_init(&req->intr_entry);
> > +
> > +     fpq = req->fpq;
> > +     spin_unlock(&fiq->lock);
> > +
> > +     if (fpq) {
> > +             spin_lock(&fpq->lock);
> > +             if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
>                                        ^^
>
> You don't need the extra () there.
>
> > +                     spin_unlock(&fpq->lock);
> > +                     /*
> > +                      * Connection is being aborted. The abort will release
> > +                      * the refcount on the request
> > +                      */
> > +                     req->out.h.error = -ECONNABORTED;
> > +                     return;
> > +             }
> > +             if (req->out.h.error == -ESTALE) {
> > +                     /*
> > +                      * Device is being released. The fuse_dev_release call
> > +                      * will drop the refcount on the request
> > +                      */
> > +                     spin_unlock(&fpq->lock);
> > +                     return;
> > +             }
> > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > +                     list_del_init(&req->list);
> > +             spin_unlock(&fpq->lock);
> > +     }
> > +
> > +     req->out.h.error = -ETIME;
> > +
> > +     if (test_bit(FR_ASYNC, &req->flags))
> > +             req->args->end(req->fm, req->args, req->out.h.error);
> > +
> > +     fuse_put_request(req);
> > +}
>
> Just a general styling thing, we have two different states for requests here,
> PENDING and !PENDING correct?  I think it may be better to do something like
>
> if (test_bit(FR_PENDING, &req->flags))
>         timeout_pending_req();
> else
>         timeout_inflight_req();
>
> and then in timeout_pending_req() you do
>
> spin_lock(&fiq->lock);
> if (!test_bit(FR_PENDING, &req->flags)) {
>         spin_unlock(&fiq_lock);
>         timeout_inflight_req();
>         return;
> }
>
> This will keep the two different state cleanup functions separate and a little
> cleaner to grok.
>
Thanks for the suggestion, I will make this change for v2.
> > +
> >  static int queue_interrupt(struct fuse_req *req)
> >  {
> >       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
> >       return 0;
> >  }
> >
> > +enum wait_type {
> > +     WAIT_TYPE_INTERRUPTIBLE,
> > +     WAIT_TYPE_KILLABLE,
> > +     WAIT_TYPE_NONINTERRUPTIBLE,
> > +};
> > +
> > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +
> > +     return wait_event_interruptible_timeout(req->waitq,
> > +                                             test_bit(FR_FINISHED,
> > +                                                      &req->flags),
> > +                                             fc->daemon_timeout);
> > +}
> > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> > +
> > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     int err;
> > +
> > +wait_answer_start:
> > +     if (type == WAIT_TYPE_INTERRUPTIBLE)
> > +             err = fuse_wait_event_interruptible_timeout(req);
> > +     else if (type == WAIT_TYPE_KILLABLE)
> > +             err = wait_event_killable_timeout(req->waitq,
> > +                                               test_bit(FR_FINISHED, &req->flags),
> > +                                               fc->daemon_timeout);
> > +
> > +     else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> > +             err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> > +                                      fc->daemon_timeout);
> > +     else
> > +             WARN_ON(1);
>
> This will leak some random value for err, so initialize err to something that
> will be dealt with, like -EINVAL;
>
> > +
> > +     /* request was answered */
> > +     if (err > 0)
> > +             return 0;
> > +
> > +     /* request was not answered in time */
> > +     if (err == 0) {
> > +             if (test_and_set_bit(FR_PROCESSING, &req->flags))
> > +                     /* request reply is being processed by kernel right now.
> > +                      * we should wait for the answer.
> > +                      */
>
> Format for multiline comments is
>
> /*
>  * blah
>  * blah
>  */
>
> and since this is a 1 line if statement put it above the if statement.
>
> > +                     goto wait_answer_start;
> > +
> > +             fuse_request_timeout(req);
> > +             return 0;
> > +     }
> > +
> > +     /* else request was interrupted */
> > +     return err;
> > +}
> > +
> >  static void request_wait_answer(struct fuse_req *req)
> >  {
> >       struct fuse_conn *fc = req->fm->fc;
> > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
> >
> >       if (!fc->no_interrupt) {
> >               /* Any signal may interrupt this */
> > -             err = wait_event_interruptible(req->waitq,
> > -                                     test_bit(FR_FINISHED, &req->flags));
> > +             if (fc->daemon_timeout)
> > +                     err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> > +             else
> > +                     err = wait_event_interruptible(req->waitq,
> > +                                                    test_bit(FR_FINISHED, &req->flags));
> >               if (!err)
> >                       return;
> >
> > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
> >
> >       if (!test_bit(FR_FORCE, &req->flags)) {
> >               /* Only fatal signals may interrupt this */
> > -             err = wait_event_killable(req->waitq,
> > -                                     test_bit(FR_FINISHED, &req->flags));
> > +             if (fc->daemon_timeout)
> > +                     err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> > +             else
> > +                     err = wait_event_killable(req->waitq,
> > +                                               test_bit(FR_FINISHED, &req->flags));
> >               if (!err)
> >                       return;
> >
> > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
> >        * Either request is already in userspace, or it was forced.
> >        * Wait it out.
> >        */
> > -     wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > +     if (fc->daemon_timeout)
> > +             wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> > +     else
> > +             wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> >  }
> >
> >  static void __fuse_request_send(struct fuse_req *req)
> > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >       req = list_entry(fiq->pending.next, struct fuse_req, list);
> >       clear_bit(FR_PENDING, &req->flags);
> >       list_del_init(&req->list);
> > +     /* Acquire a reference since fuse_request_timeout may also be executing  */
> > +     __fuse_get_request(req);
> > +     req->fpq = fpq;
> >       spin_unlock(&fiq->lock);
> >
>
> There's a race here with completion.  If we timeout a request right here, we can
> end up sending that same request below.

I don't think there's any way around this unless we take the fpq lock
while we do the fuse_copy stuff, because even if we check the
FR_PROCESSING bit, the timeout handler could start running after the
fpq lock is released when we do the fuse_copy calls.

In my point of view, I don't think this race matters. We could have
this situation happen on a regular timed-out request. For example, we
send out a request to userspace and if the server takes too long to
reply, the request is cancelled/invalidated in the kernel but the
server will still see the request anyways.

WDYT?

>
> You are going to need to check
>
> test_bit(FR_PROCESSING)
>
> after you take the fpq->lock just below here to make sure you didn't race with
> the timeout handler and time the request out already.
>
> >       args = req->args;
> > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >               if (args->opcode == FUSE_SETXATTR)
> >                       req->out.h.error = -E2BIG;
> >               fuse_request_end(req);
> > +             fuse_put_request(req);
> >               goto restart;
> >       }
> >       spin_lock(&fpq->lock);
> > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >       }
> >       hash = fuse_req_hash(req->in.h.unique);
> >       list_move_tail(&req->list, &fpq->processing[hash]);
> > -     __fuse_get_request(req);
> >       set_bit(FR_SENT, &req->flags);
> >       spin_unlock(&fpq->lock);
> >       /* matches barrier in request_wait_answer() */
> >       smp_mb__after_atomic();
> >       if (test_bit(FR_INTERRUPTED, &req->flags))
> >               queue_interrupt(req);
> > +
> > +     /* Check if request timed out */
> > +     if (test_bit(FR_PROCESSING, &req->flags)) {
> > +             spin_lock(&fpq->lock);
> > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > +                     list_del_init(&req->list);
> > +             spin_unlock(&fpq->lock);
> > +             fuse_put_request(req);
> > +             return -ETIME;
> > +     }
>
> This isn't quite right, we could have FR_PROCESSING set because we completed the
> request before we got here.  If you put a schedule_timeout(HZ); right above this
> you could easily see where a request gets completed by userspace, but now you've
> fimed it out.

Oh I see, you're talking about the race where a request is replied to
immediately after the fuse_copy calls and before this gets called.
Then when we get here, we can't differentiate between whether
FR_PROCESSING was set by the timeout handler or the reply handler.

I think the simplest way around this is to check if the FR_SENT flag
was cleared (the reply handler clears it while holding the fpq lock
where FR_PROCESSING gets set and the timeout handler doesn't clear
it), then return -ETIME if it wasn't and 0 if it was.

I'll add this into v2.

>
> Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
> cleanup have been done already?  I don't understand why we need to handle this
> here, we should just return and whoever is waiting on the request will get the
> error.

In most cases yes, but there is a race where the timeout handler may
finish executing before the logic in dev_do_read that adds the request
to the fpq lists. If this happens, the freed request will remain on
the list.

i think this race currently exists  prior to these changes as well -
in the case you mentioned above where the request may have been
completed right after the fuse_copy calls in dev_do_read  and before
dev_do_read moves the request to the fpq lists. We would get into the
same situation with a freed request still on the list.

Thanks,
Joanne
>
> Thanks,
>
> Josef