Hi Joanne, On 9/4/24 19:23, Joanne Koong wrote: > On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert >> >> >> I have question here, does it need to be an exact timeout or could it be >> an interval/epoch? Let's say you timeout based on epoch lists? Example >> >> 1) epoch-a starts, requests are added to epoch-a list. >> 2) epoch-b starts, epoch-a list should get empty >> 3) epoch-c starts, epoch-b list should get empty, kill the connection if >> epoch-a list is not empty (epoch-c list should not be needed, as epoch-a >> list can be used, once confirmed it is empty.) >> >> >> Here timeout would be epoch-a + epoch-b, i.e. >> max-timeout <= 2 * epoch-time. >> We could have more epochs/list-heads to make it more fine grained. >> >> >> From my point of view that should be a rather cheap, as it just >> adding/removing requests from list and checking for timeout if a list is >> empty. With the caveat that it is not precise anymore. > > I like this idea a lot. I like that it enforces per-request behavior > and guarantees that any stalled request will abort the connection. I > think it's fine for the timeout to be an interval/epoch so long as the > documentation explicitly makes that clear. I think this would need to > be done in the kernel instead of libfuse because if the server is in a > deadlock when there are no pending requests in the lists and then the > kernel sends requests to the server, none of the requests will make it > to the list for the timer handler to detect any issues. > > Before I make this change for v7, Miklos what are your thoughts on > this direction? we briefly discussed it with Miklos and Miklos agreed that epoch list should be fine (would be great if you could quickly confirm, Miklos). In the mean time I have another use case for timeout lists. Basically Jakob from Cern (in CC) is asking for way to stop requests to fuse-server and then to resume. I think that can be done easily through notifications and unsetting (and later setting) fc->initialized. Demo patch follows around tomorrow, but then Jakob actually wants to know when it is safe to restart fuse-server (or part of it). That is where the epoch timeout list would be handy - reply to the notification should happen when the lists got empty, i.e. no request is handled anymore. I think like this is better than FUSE_NOTIFY_RESEND, as that has an issue with non-idempotent requests. Thanks, Bernd