Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests

Bernd Schubert <bernd.schubert@xxxxxxxxxxx> · Tue, 6 Aug 2024 17:43:44 +0200

On 8/6/24 00:10, Joanne Koong wrote:
> On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert
> <bernd.schubert@xxxxxxxxxxx> wrote:
>>>> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>>>>                  if (args->opcode == FUSE_SETXATTR)
>>>>                          req->out.h.error = -E2BIG;
>>>>                  fuse_request_end(req);
>>>> +               fuse_put_request(req);
>>>>                  goto restart;
>>>
>>> While rereading through fuse_dev_do_read, I just realized we also need
>>> to handle the race condition for the error edge cases (here and in the
>>> "goto out_end;"), since the timeout handler could have finished
>>> executing by the time we hit the error edge case. We need to
>>> test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
>>> dev_do_read cleans up the request, but not both. I'll fix this for v3.
>>
>> I know it would change semantics a bit, but wouldn't it be much easier /
>> less racy if fuse_dev_do_read() would delete the timer when it takes a
>> request from fiq->pending and add it back in (with new timeouts) before
>> it returns the request?
>>
> 
> Ooo I really like this idea! I'm worried though that this might allow
> potential scenarios where the fuse_dev_do_read gets descheduled after
> disarming the timer and a non-trivial amount of time elapses before it
> gets scheduled back (eg on a system where the CPU is starved), in
> which case the fuse req_timeout value will be (somewhat of) a lie. If
> you and others think this is likely fine though, then I'll incorporate
> this into v3 which will make this logic a lot simpler :)
> 

In my opinion we only need to worry about fuse server getting stuck. I
think we would have a grave issue if fuse_dev_do_read() gets descheduled
for a long time - the timer might not run either in that case. Main
issue I see with removing/re-adding the timer - it might double the
timeout in worst case. In my personal opinion acceptable as it reduces
code complexity.

Thanks
Bernd