On 7/18/24 07:24, Joanne Koong wrote: > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > <bernd.schubert@xxxxxxxxxxx> wrote: >> >> Hi Joanne, >> >> On 7/17/24 23:34, Joanne Koong wrote: >>> There are situations where fuse servers can become unresponsive or take >>> too long to reply to a request. Currently there is no upper bound on >>> how long a request may take, which may be frustrating to users who get >>> stuck waiting for a request to complete. >>> >>> This commit adds a daemon timeout option (in seconds) for fuse requests. >>> If the timeout elapses before the request is replied to, the request will >>> fail with -ETIME. >>> >>> There are 3 possibilities for a request that times out: >>> a) The request times out before the request has been sent to userspace >>> b) The request times out after the request has been sent to userspace >>> and before it receives a reply from the server >>> c) The request times out after the request has been sent to userspace >>> and the server replies while the kernel is timing out the request >>> >>> Proper synchronization must be added to ensure that the request is >>> handled correctly in all of these cases. To this effect, there is a new >>> FR_PROCESSING bit added to the request flags, which is set atomically by >>> either the timeout handler (see fuse_request_timeout()) which is invoked >>> after the request timeout elapses or set by the request reply handler >>> (see dev_do_write()), whichever gets there first. >>> >>> If the reply handler and the timeout handler are executing simultaneously >>> and the reply handler sets FR_PROCESSING before the timeout handler, then >>> the request is re-queued onto the waitqueue and the kernel will process the >>> reply as though the timeout did not elapse. If the timeout handler sets >>> FR_PROCESSING before the reply handler, then the request will fail with >>> -ETIME and the request will be cleaned up. >>> >>> Proper acquires on the request reference must be added to ensure that the >>> timeout handler does not drop the last refcount on the request while the >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is >>> still accessing the request. (By "forwarder handler", this is the handler >>> that forwards the request to userspace). >>> >>> Currently, this is the lifecycle of the request refcount: >>> >>> Request is created: >>> fuse_simple_request -> allocates request, sets refcount to 1 >>> __fuse_request_send -> acquires refcount >>> queues request and waits for reply... >>> fuse_simple_request -> drops refcount >>> >>> Request is freed: >>> fuse_dev_do_write >>> fuse_request_end -> drops refcount on request >>> >>> The timeout handler drops the refcount on the request so that the >>> request is properly cleaned up if a reply is never received. Because of >>> this, both the forwarder handler and the reply handler must acquire a refcount >>> on the request while it accesses the request, and the refcount must be >>> acquired while the lock of the list the request is on is held. >>> >>> There is a potential race if the request is being forwarded to >>> userspace while the timeout handler is executing (eg FR_PENDING has >>> already been cleared but dev_do_read() hasn't finished executing). This >>> is a problem because this would free the request but the request has not >>> been removed from the fpq list it's on. To prevent this, dev_do_read() >>> must check FR_PROCESSING at the end of its logic and remove the request >>> from the fpq list if the timeout occurred. >>> >>> There is also the case where the connection may be aborted or the >>> device may be released while the timeout handler is running. To protect >>> against an extra refcount drop on the request, the timeout handler >>> checks the connected state of the list and lets the abort handler drop the >>> last reference if the abort is running simultaneously. Similarly, the >>> timeout handler also needs to check if the req->out.h.error is set to >>> -ESTALE, which indicates that the device release is cleaning up the >>> request. In both these cases, the timeout handler will return without >>> dropping the refcount. >>> >>> Please also note that background requests are not applicable for timeouts >>> since they are asynchronous. >> >> >> This and that thread here actually make me wonder if this is the right >> approach >> >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@xxxxxxxxxx/T/ >> >> >> In th3 thread above a request got interrupted, but fuse-server still >> does not manage stop it. From my point of view, interrupting a request >> suggests to add a rather short kernel lifetime for it. With that one > > Hi Bernd, > > I believe this solution fixes the problem outlined in that thread > (namely, that the process gets stuck waiting for a reply). If the > request is interrupted before it times out, the kernel will wait with > a timeout again on the request (timeout would start over, but the > request will still eventually sooner or later time out). I'm not sure > I agree that we want to cancel the request altogether if it's > interrupted. For example, if the user uses the user-defined signal > SIGUSR1, it can be unexpected and arbitrary behavior for the request > to be aborted by the kernel. I also don't think this can be consistent > for what the fuse server will see since some requests may have already > been forwarded to userspace when the request is aborted and some > requests may not have. > > I think if we were to enforce that the request should be aborted when > it's interrupted regardless of whether a timeout is specified or not, > then we should do it similarly to how the timeout handler logic > handles it in this patch,rather than the implementation in the thread > linked above (namely, that the request should be explicitly cleaned up > immediately instead of when the interrupt request sends a reply); I > don't believe the implementation in the link handles the case where > for example the fuse server is in a deadlock and does not reply to the > interrupt request. Also, as I understand it, it is optional for > servers to reply or not to the interrupt request. Hi Joanne, yeah, the solution in the link above is definitely not ideal and I think a timout based solution would be better. But I think your patch wouldn't work either right now, unless server side sets a request timeout. Btw, I would rename the variable 'daemon_timeout' to somethink like req_timeout. > >> either needs to wake up in intervals and check if request timeout got >> exceeded or it needs to be an async kernel thread. I think that async >> thread would also allow to give a timeout to background requests. > > in my opinion, background requests do not need timeouts. As I > understand it, background requests are used only for direct i/o async > read/writes, writing back dirty pages,and readahead requests generated > by the kernel. I don't think fuse servers would have a need for timing > out background requests. There is another discussion here, where timeouts are a possible although ugly solution to avoid page copies https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@xxxxxxxxxxx/T/ That is the bg writeback code path. > >> >> Or we add an async timeout to bg and interupted requests additionally? > > The interrupted request will already have a timeout on it since it > waits with a timeout again for the reply after it's interrupted. If daemon side configures timeouts. And interrupted requests might want to have a different timeout. I will check when I'm back if we can update your patch a bit for that. Your patch hooks in quite nicely and basically without overhead into fg (sync) requests. Timing out bg requests will have a bit overhead (unless I miss something), so maybe we need two solutions here. And if we want to go that route at all, to avoid these extra fuse page copies. > >> >> >> (I only basically reviewed, can't do carefully right now - on vacation >> and as I just noticed, on a laptop that gives me electric shocks when I >> connect it to power.) > > No worries, thanks for your comments and hope you have a great > vacation (without getting shocked :))! Thank you! For now I'm not connecting power, 3h of battery left :) Thanks, Bernd