On 8/6/24 00:10, Joanne Koong wrote: > On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert > <bernd.schubert@xxxxxxxxxxx> wrote: >>>> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, >>>> if (args->opcode == FUSE_SETXATTR) >>>> req->out.h.error = -E2BIG; >>>> fuse_request_end(req); >>>> + fuse_put_request(req); >>>> goto restart; >>> >>> While rereading through fuse_dev_do_read, I just realized we also need >>> to handle the race condition for the error edge cases (here and in the >>> "goto out_end;"), since the timeout handler could have finished >>> executing by the time we hit the error edge case. We need to >>> test_and_set_bit(FR_FINISHING) so that either the timeout_handler or >>> dev_do_read cleans up the request, but not both. I'll fix this for v3. >> >> I know it would change semantics a bit, but wouldn't it be much easier / >> less racy if fuse_dev_do_read() would delete the timer when it takes a >> request from fiq->pending and add it back in (with new timeouts) before >> it returns the request? >> > > Ooo I really like this idea! I'm worried though that this might allow > potential scenarios where the fuse_dev_do_read gets descheduled after > disarming the timer and a non-trivial amount of time elapses before it > gets scheduled back (eg on a system where the CPU is starved), in > which case the fuse req_timeout value will be (somewhat of) a lie. If > you and others think this is likely fine though, then I'll incorporate > this into v3 which will make this logic a lot simpler :) > In my opinion we only need to worry about fuse server getting stuck. I think we would have a grave issue if fuse_dev_do_read() gets descheduled for a long time - the timer might not run either in that case. Main issue I see with removing/re-adding the timer - it might double the timeout in worst case. In my personal opinion acceptable as it reduces code complexity. Thanks Bernd