Re: [PATCH] fuse: add optional kernel-enforced timeout for fuse requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Wed, 24 Jul 2024 13:44:10 -0700

On Thu, Jul 18, 2024 at 9:58 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
> ...
> > > ---
> > >  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
> > >  fs/fuse/fuse_i.h |  12 ++++
> > >  fs/fuse/inode.c  |   7 ++
> > >  3 files changed, 188 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > index 9eb191b5c4de..7dd7b244951b 100644
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
> > >  }
> > >  EXPORT_SYMBOL_GPL(fuse_request_end);
> > >
> > > +/* fuse_request_end for requests that timeout */
> > > +static void fuse_request_timeout(struct fuse_req *req)
> > > +{
> > > +     struct fuse_conn *fc = req->fm->fc;
> > > +     struct fuse_iqueue *fiq = &fc->iq;
> > > +     struct fuse_pqueue *fpq;
> > > +
> > > +     spin_lock(&fiq->lock);
> > > +     if (!fiq->connected) {
> > > +             spin_unlock(&fiq->lock);
> > > +             /*
> > > +              * Connection is being aborted. The abort will release
> > > +              * the refcount on the request
> > > +              */
> > > +             req->out.h.error = -ECONNABORTED;
> > > +             return;
> > > +     }
> > > +     if (test_bit(FR_PENDING, &req->flags)) {
> > > +             /* Request is not yet in userspace, bail out */
> > > +             list_del(&req->list);
> > > +             spin_unlock(&fiq->lock);
> > > +             req->out.h.error = -ETIME;
> > > +             __fuse_put_request(req);
> >
> > Why is this safe?  We could be the last holder of the reference on this request
> > correct?  The only places using __fuse_put_request() would be where we are in a
> > path where the caller already holds a reference on the request.  Since this is
> > async it may not be the case right?
> >
> > If it is safe then it's just confusing and warrants a comment.
> >
>
> There is always a refcount still held on the request by
> fuse_simple_request() when this is called. I'll add a comment about
> this.
> I also just noticed that I use fuse_put_request()  at the end of this
> function, I'll change that to __fuse_put_request() as well.
>
> > > +             return;
> > > +     }
> > > +     if (test_bit(FR_INTERRUPTED, &req->flags))
> > > +             list_del_init(&req->intr_entry);
> > > +
> > > +     fpq = req->fpq;
> > > +     spin_unlock(&fiq->lock);
> > > +
> > > +     if (fpq) {
> > > +             spin_lock(&fpq->lock);
> > > +             if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
> >                                        ^^
> >
> > You don't need the extra () there.
> >
> > > +                     spin_unlock(&fpq->lock);
> > > +                     /*
> > > +                      * Connection is being aborted. The abort will release
> > > +                      * the refcount on the request
> > > +                      */
> > > +                     req->out.h.error = -ECONNABORTED;
> > > +                     return;
> > > +             }
> > > +             if (req->out.h.error == -ESTALE) {
> > > +                     /*
> > > +                      * Device is being released. The fuse_dev_release call
> > > +                      * will drop the refcount on the request
> > > +                      */
> > > +                     spin_unlock(&fpq->lock);
> > > +                     return;
> > > +             }
> > > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > > +                     list_del_init(&req->list);
> > > +             spin_unlock(&fpq->lock);
> > > +     }
> > > +
> > > +     req->out.h.error = -ETIME;
> > > +
> > > +     if (test_bit(FR_ASYNC, &req->flags))
> > > +             req->args->end(req->fm, req->args, req->out.h.error);
> > > +
> > > +     fuse_put_request(req);
> > > +}
> >
> > Just a general styling thing, we have two different states for requests here,
> > PENDING and !PENDING correct?  I think it may be better to do something like
> >
> > if (test_bit(FR_PENDING, &req->flags))
> >         timeout_pending_req();
> > else
> >         timeout_inflight_req();
> >
> > and then in timeout_pending_req() you do
> >
> > spin_lock(&fiq->lock);
> > if (!test_bit(FR_PENDING, &req->flags)) {
> >         spin_unlock(&fiq_lock);
> >         timeout_inflight_req();
> >         return;
> > }
> >
> > This will keep the two different state cleanup functions separate and a little
> > cleaner to grok.
> >
> Thanks for the suggestion, I will make this change for v2.
> > > +
> > >  static int queue_interrupt(struct fuse_req *req)
> > >  {
> > >       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
> > >       return 0;
> > >  }
> > >
> > > +enum wait_type {
> > > +     WAIT_TYPE_INTERRUPTIBLE,
> > > +     WAIT_TYPE_KILLABLE,
> > > +     WAIT_TYPE_NONINTERRUPTIBLE,
> > > +};
> > > +
> > > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> > > +{
> > > +     struct fuse_conn *fc = req->fm->fc;
> > > +
> > > +     return wait_event_interruptible_timeout(req->waitq,
> > > +                                             test_bit(FR_FINISHED,
> > > +                                                      &req->flags),
> > > +                                             fc->daemon_timeout);
> > > +}
> > > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> > > +
> > > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> > > +{
> > > +     struct fuse_conn *fc = req->fm->fc;
> > > +     int err;
> > > +
> > > +wait_answer_start:
> > > +     if (type == WAIT_TYPE_INTERRUPTIBLE)
> > > +             err = fuse_wait_event_interruptible_timeout(req);
> > > +     else if (type == WAIT_TYPE_KILLABLE)
> > > +             err = wait_event_killable_timeout(req->waitq,
> > > +                                               test_bit(FR_FINISHED, &req->flags),
> > > +                                               fc->daemon_timeout);
> > > +
> > > +     else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> > > +             err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> > > +                                      fc->daemon_timeout);
> > > +     else
> > > +             WARN_ON(1);
> >
> > This will leak some random value for err, so initialize err to something that
> > will be dealt with, like -EINVAL;
> >
> > > +
> > > +     /* request was answered */
> > > +     if (err > 0)
> > > +             return 0;
> > > +
> > > +     /* request was not answered in time */
> > > +     if (err == 0) {
> > > +             if (test_and_set_bit(FR_PROCESSING, &req->flags))
> > > +                     /* request reply is being processed by kernel right now.
> > > +                      * we should wait for the answer.
> > > +                      */
> >
> > Format for multiline comments is
> >
> > /*
> >  * blah
> >  * blah
> >  */
> >
> > and since this is a 1 line if statement put it above the if statement.
> >
> > > +                     goto wait_answer_start;
> > > +
> > > +             fuse_request_timeout(req);
> > > +             return 0;
> > > +     }
> > > +
> > > +     /* else request was interrupted */
> > > +     return err;
> > > +}
> > > +
> > >  static void request_wait_answer(struct fuse_req *req)
> > >  {
> > >       struct fuse_conn *fc = req->fm->fc;
> > > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
> > >
> > >       if (!fc->no_interrupt) {
> > >               /* Any signal may interrupt this */
> > > -             err = wait_event_interruptible(req->waitq,
> > > -                                     test_bit(FR_FINISHED, &req->flags));
> > > +             if (fc->daemon_timeout)
> > > +                     err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> > > +             else
> > > +                     err = wait_event_interruptible(req->waitq,
> > > +                                                    test_bit(FR_FINISHED, &req->flags));
> > >               if (!err)
> > >                       return;
> > >
> > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
> > >
> > >       if (!test_bit(FR_FORCE, &req->flags)) {
> > >               /* Only fatal signals may interrupt this */
> > > -             err = wait_event_killable(req->waitq,
> > > -                                     test_bit(FR_FINISHED, &req->flags));
> > > +             if (fc->daemon_timeout)
> > > +                     err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> > > +             else
> > > +                     err = wait_event_killable(req->waitq,
> > > +                                               test_bit(FR_FINISHED, &req->flags));
> > >               if (!err)
> > >                       return;
> > >
> > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
> > >        * Either request is already in userspace, or it was forced.
> > >        * Wait it out.
> > >        */
> > > -     wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > > +     if (fc->daemon_timeout)
> > > +             wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> > > +     else
> > > +             wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > >  }
> > >
> > >  static void __fuse_request_send(struct fuse_req *req)
> > > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> > >       req = list_entry(fiq->pending.next, struct fuse_req, list);
> > >       clear_bit(FR_PENDING, &req->flags);
> > >       list_del_init(&req->list);
> > > +     /* Acquire a reference since fuse_request_timeout may also be executing  */
> > > +     __fuse_get_request(req);
> > > +     req->fpq = fpq;
> > >       spin_unlock(&fiq->lock);
> > >
> >
> > There's a race here with completion.  If we timeout a request right here, we can
> > end up sending that same request below.
>
> I don't think there's any way around this unless we take the fpq lock
> while we do the fuse_copy stuff, because even if we check the
> FR_PROCESSING bit, the timeout handler could start running after the
> fpq lock is released when we do the fuse_copy calls.
>
> In my point of view, I don't think this race matters. We could have
> this situation happen on a regular timed-out request. For example, we
> send out a request to userspace and if the server takes too long to
> reply, the request is cancelled/invalidated in the kernel but the
> server will still see the request anyways.
>
> WDYT?
>
> >
> > You are going to need to check
> >
> > test_bit(FR_PROCESSING)
> >
> > after you take the fpq->lock just below here to make sure you didn't race with
> > the timeout handler and time the request out already.
> >
> > >       args = req->args;
> > > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> > >               if (args->opcode == FUSE_SETXATTR)
> > >                       req->out.h.error = -E2BIG;
> > >               fuse_request_end(req);
> > > +             fuse_put_request(req);
> > >               goto restart;
> > >       }
> > >       spin_lock(&fpq->lock);
> > > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> > >       }
> > >       hash = fuse_req_hash(req->in.h.unique);
> > >       list_move_tail(&req->list, &fpq->processing[hash]);
> > > -     __fuse_get_request(req);
> > >       set_bit(FR_SENT, &req->flags);
> > >       spin_unlock(&fpq->lock);
> > >       /* matches barrier in request_wait_answer() */
> > >       smp_mb__after_atomic();
> > >       if (test_bit(FR_INTERRUPTED, &req->flags))
> > >               queue_interrupt(req);
> > > +
> > > +     /* Check if request timed out */
> > > +     if (test_bit(FR_PROCESSING, &req->flags)) {
> > > +             spin_lock(&fpq->lock);
> > > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > > +                     list_del_init(&req->list);
> > > +             spin_unlock(&fpq->lock);
> > > +             fuse_put_request(req);
> > > +             return -ETIME;
> > > +     }
> >
> > This isn't quite right, we could have FR_PROCESSING set because we completed the
> > request before we got here.  If you put a schedule_timeout(HZ); right above this
> > you could easily see where a request gets completed by userspace, but now you've
> > fimed it out.
>
> Oh I see, you're talking about the race where a request is replied to
> immediately after the fuse_copy calls and before this gets called.
> Then when we get here, we can't differentiate between whether
> FR_PROCESSING was set by the timeout handler or the reply handler.
>
> I think the simplest way around this is to check if the FR_SENT flag
> was cleared (the reply handler clears it while holding the fpq lock
> where FR_PROCESSING gets set and the timeout handler doesn't clear
> it), then return -ETIME if it wasn't and 0 if it was.
>
> I'll add this into v2.
>
> >
> > Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
> > cleanup have been done already?  I don't understand why we need to handle this
> > here, we should just return and whoever is waiting on the request will get the
> > error.
>
> In most cases yes, but there is a race where the timeout handler may
> finish executing before the logic in dev_do_read that adds the request
> to the fpq lists. If this happens, the freed request will remain on
> the list.
>
> i think this race currently exists  prior to these changes as well -

amendment: this statement is not accurate. In the existing code, there
is no race between the reply handler and dev_do_read, because the
reply handler can only handle the request once the request is on the
fpq->processing list.
(We do need to account for this race with the timeout handler though
since the timeout handler can get triggered at any time).

Also, while working on v2 I noticed we also need to handle races
between the timeout handler and requests being re-sent
(fuse_resend()). This will get addressed in v2.

> in the case you mentioned above where the request may have been
> completed right after the fuse_copy calls in dev_do_read  and before
> dev_do_read moves the request to the fpq lists. We would get into the
> same situation with a freed request still on the list.
>
>
> Thanks,
> Joanne
> >
> > Thanks,
> >
> > Josef