Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Tue, 6 Aug 2024 10:08:56 -0700

On Tue, Aug 6, 2024 at 8:43 AM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 8/6/24 00:10, Joanne Koong wrote:
> > On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert
> > <bernd.schubert@xxxxxxxxxxx> wrote:
> >>>> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >>>>                  if (args->opcode == FUSE_SETXATTR)
> >>>>                          req->out.h.error = -E2BIG;
> >>>>                  fuse_request_end(req);
> >>>> +               fuse_put_request(req);
> >>>>                  goto restart;
> >>>
> >>> While rereading through fuse_dev_do_read, I just realized we also need
> >>> to handle the race condition for the error edge cases (here and in the
> >>> "goto out_end;"), since the timeout handler could have finished
> >>> executing by the time we hit the error edge case. We need to
> >>> test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
> >>> dev_do_read cleans up the request, but not both. I'll fix this for v3.
> >>
> >> I know it would change semantics a bit, but wouldn't it be much easier /
> >> less racy if fuse_dev_do_read() would delete the timer when it takes a
> >> request from fiq->pending and add it back in (with new timeouts) before
> >> it returns the request?
> >>
> >
> > Ooo I really like this idea! I'm worried though that this might allow
> > potential scenarios where the fuse_dev_do_read gets descheduled after
> > disarming the timer and a non-trivial amount of time elapses before it
> > gets scheduled back (eg on a system where the CPU is starved), in
> > which case the fuse req_timeout value will be (somewhat of) a lie. If
> > you and others think this is likely fine though, then I'll incorporate
> > this into v3 which will make this logic a lot simpler :)
> >
>
> In my opinion we only need to worry about fuse server getting stuck. I
> think we would have a grave issue if fuse_dev_do_read() gets descheduled
> for a long time - the timer might not run either in that case. Main
> issue I see with removing/re-adding the timer - it might double the
> timeout in worst case. In my personal opinion acceptable as it reduces
> code complexity.
>

Awesome, thanks for this suggestion Bernd! I'll make this change for
v3, this will get rid of having to handle all the possible races
between dev_do_read and the timeout handler.

I'm planning to rearm the timer with its original req->timer.expires
(which was set to "jiffies +  fc->req_timeout" at the time the timer
was started the first time), so I think this will retain the original
timeout and won't add any extra time to it. And according to the timer
docs, "if @timer->expires is already in the past @timer will be queued
to expire at the next timer tick".

Thanks,
Joanne
>
> Thanks
> Bernd