Re: [PATCH] fuse: add optional kernel-enforced timeout for fuse requests

Josef Bacik <josef@xxxxxxxxxxxxxx> · Mon, 22 Jul 2024 16:52:17 -0400

On Mon, Jul 22, 2024 at 11:58:03AM -0700, Joanne Koong wrote:
> On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote:
> > > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert
> > > <bs_lists@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > >
> > > >
> > > > On 7/18/24 07:24, Joanne Koong wrote:
> > > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> > > > > <bernd.schubert@xxxxxxxxxxx> wrote:
> > > > >>
> > > > >> Hi Joanne,
> > > > >>
> > > > >> On 7/17/24 23:34, Joanne Koong wrote:
> > > > >>> There are situations where fuse servers can become unresponsive or take
> > > > >>> too long to reply to a request. Currently there is no upper bound on
> > > > >>> how long a request may take, which may be frustrating to users who get
> > > > >>> stuck waiting for a request to complete.
> > > > >>>
> > > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests.
> > > > >>> If the timeout elapses before the request is replied to, the request will
> > > > >>> fail with -ETIME.
> > > > >>>
> > > > >>> There are 3 possibilities for a request that times out:
> > > > >>> a) The request times out before the request has been sent to userspace
> > > > >>> b) The request times out after the request has been sent to userspace
> > > > >>> and before it receives a reply from the server
> > > > >>> c) The request times out after the request has been sent to userspace
> > > > >>> and the server replies while the kernel is timing out the request
> > > > >>>
> > > > >>> Proper synchronization must be added to ensure that the request is
> > > > >>> handled correctly in all of these cases. To this effect, there is a new
> > > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by
> > > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> > > > >>> after the request timeout elapses or set by the request reply handler
> > > > >>> (see dev_do_write()), whichever gets there first.
> > > > >>>
> > > > >>> If the reply handler and the timeout handler are executing simultaneously
> > > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then
> > > > >>> the request is re-queued onto the waitqueue and the kernel will process the
> > > > >>> reply as though the timeout did not elapse. If the timeout handler sets
> > > > >>> FR_PROCESSING before the reply handler, then the request will fail with
> > > > >>> -ETIME and the request will be cleaned up.
> > > > >>>
> > > > >>> Proper acquires on the request reference must be added to ensure that the
> > > > >>> timeout handler does not drop the last refcount on the request while the
> > > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> > > > >>> still accessing the request. (By "forwarder handler", this is the handler
> > > > >>> that forwards the request to userspace).
> > > > >>>
> > > > >>> Currently, this is the lifecycle of the request refcount:
> > > > >>>
> > > > >>> Request is created:
> > > > >>> fuse_simple_request -> allocates request, sets refcount to 1
> > > > >>>   __fuse_request_send -> acquires refcount
> > > > >>>     queues request and waits for reply...
> > > > >>> fuse_simple_request -> drops refcount
> > > > >>>
> > > > >>> Request is freed:
> > > > >>> fuse_dev_do_write
> > > > >>>   fuse_request_end -> drops refcount on request
> > > > >>>
> > > > >>> The timeout handler drops the refcount on the request so that the
> > > > >>> request is properly cleaned up if a reply is never received. Because of
> > > > >>> this, both the forwarder handler and the reply handler must acquire a refcount
> > > > >>> on the request while it accesses the request, and the refcount must be
> > > > >>> acquired while the lock of the list the request is on is held.
> > > > >>>
> > > > >>> There is a potential race if the request is being forwarded to
> > > > >>> userspace while the timeout handler is executing (eg FR_PENDING has
> > > > >>> already been cleared but dev_do_read() hasn't finished executing). This
> > > > >>> is a problem because this would free the request but the request has not
> > > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read()
> > > > >>> must check FR_PROCESSING at the end of its logic and remove the request
> > > > >>> from the fpq list if the timeout occurred.
> > > > >>>
> > > > >>> There is also the case where the connection may be aborted or the
> > > > >>> device may be released while the timeout handler is running. To protect
> > > > >>> against an extra refcount drop on the request, the timeout handler
> > > > >>> checks the connected state of the list and lets the abort handler drop the
> > > > >>> last reference if the abort is running simultaneously. Similarly, the
> > > > >>> timeout handler also needs to check if the req->out.h.error is set to
> > > > >>> -ESTALE, which indicates that the device release is cleaning up the
> > > > >>> request. In both these cases, the timeout handler will return without
> > > > >>> dropping the refcount.
> > > > >>>
> > > > >>> Please also note that background requests are not applicable for timeouts
> > > > >>> since they are asynchronous.
> > > > >>
> > > > >>
> > > > >> This and that thread here actually make me wonder if this is the right
> > > > >> approach
> > > > >>
> > > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@xxxxxxxxxx/T/
> > > > >>
> > > > >>
> > > > >> In  th3 thread above a request got interrupted, but fuse-server still
> > > > >> does not manage stop it. From my point of view, interrupting a request
> > > > >> suggests to add a rather short kernel lifetime for it. With that one
> > > > >
> > > > > Hi Bernd,
> > > > >
> > > > > I believe this solution fixes the problem outlined in that thread
> > > > > (namely, that the process gets stuck waiting for a reply). If the
> > > > > request is interrupted before it times out, the kernel will wait with
> > > > > a timeout again on the request (timeout would start over, but the
> > > > > request will still eventually sooner or later time out). I'm not sure
> > > > > I agree that we want to cancel the request altogether if it's
> > > > > interrupted. For example, if the user uses the user-defined signal
> > > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request
> > > > > to be aborted by the kernel. I also don't think this can be consistent
> > > > > for what the fuse server will see since some requests may have already
> > > > > been forwarded to userspace when the request is aborted and some
> > > > > requests may not have.
> > > > >
> > > > > I think if we were to enforce that the request should be aborted when
> > > > > it's interrupted regardless of whether a timeout is specified or not,
> > > > > then we should do it similarly to how the timeout handler logic
> > > > > handles it in this patch,rather than the implementation in the thread
> > > > > linked above (namely, that the request should be explicitly cleaned up
> > > > > immediately instead of when the interrupt request sends a reply); I
> > > > > don't believe the implementation in the link handles the case where
> > > > > for example the fuse server is in a deadlock and does not reply to the
> > > > > interrupt request. Also, as I understand it, it is optional for
> > > > > servers to reply or not to the interrupt request.
> > > >
> > > > Hi Joanne,
> > > >
> > > > yeah, the solution in the link above is definitely not ideal and I think
> > > > a timout based solution would be better. But I think your patch wouldn't
> > > > work either right now, unless server side sets a request timeout.
> > > > Btw, I would rename the variable 'daemon_timeout' to somethink like
> > > > req_timeout.
> > > >
> > > Hi Bernd,
> > >
> > > I think we need to figure out if we indeed want the kernel to abort
> > > interrupted requests if no request timeout was explicitly set by the
> > > server. I'm leaning towards no, for the reasons in my previous reply;
> > > in addition to that I'm also not sure if we would be potentially
> > > breaking existing filesystems if we introduced this new behavior.
> > > Curious to hear your and others' thoughts on this.
> > >
> > > (Btw, if we did want to add this in, i think the change would be
> > > actually pretty simple. We could pretty much just reuse all the logic
> > > that's implemented in the timeout handling code. It's very much the
> > > same scenario (request getting aborted and needing to protect against
> > > races with different handlers))
> > >
> > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion.
> > >
> > > > >
> > > > >> either needs to wake up in intervals and check if request timeout got
> > > > >> exceeded or it needs to be an async kernel thread. I think that async
> > > > >> thread would also allow to give a timeout to background requests.
> > > > >
> > > > > in my opinion, background requests do not need timeouts. As I
> > > > > understand it, background requests are used only for direct i/o async
> > > > > read/writes, writing back dirty pages,and readahead requests generated
> > > > > by the kernel. I don't think fuse servers would have a need for timing
> > > > > out background requests.
> > > >
> > > > There is another discussion here, where timeouts are a possible although
> > > > ugly solution to avoid page copies
> > > >
> > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@xxxxxxxxxxx/T/
> > > >
> > > Thanks for the link, it's an interesting read.
> > >
> > > >
> > > > That is the bg writeback code path.
> > > >
> > > > >
> > > > >>
> > > > >> Or we add an async timeout to bg and interupted requests additionally?
> > > > >
> > > > > The interrupted request will already have a timeout on it since it
> > > > > waits with a timeout again for the reply after it's interrupted.
> > > >
> > > > If daemon side configures timeouts. And interrupted requests might want
> > > > to have a different timeout. I will check when I'm back if we can update
> > > > your patch a bit for that.
> > > >
> > > > Your patch hooks in quite nicely and basically without overhead into fg
> > > > (sync) requests. Timing out bg requests will have a bit overhead (unless
> > > > I miss something), so maybe we need two solutions here. And if we want
> > > > to go that route at all, to avoid these extra fuse page copies.
> > > >
> > > Agreed, I think if we do decide to go down this route, it seems
> > > cleaner to me to have the background request timeouts handled
> > > separately. Maybe something like having a timer per batch (where
> > > "batch" is the group of requests that get flushed at the same time)?
> > > That seems to me like the approach with the least overhead.
> > >
> >
> > I don't want to have a bunch of different timeouts, we should just have one and
> > have consistent behavior across all classes of requests.
> >
> > I think the only thing we should have that is "separate" is a way to set request
> > timeouts that aren't set by the daemon itself.  Administrators should be able to
> > set a per-mount timeout via sysfs/algo in order to have some sort of control
> > over possibly malicious FUSE file systems.
> >
> > But that should just tie into whatever mechanism you come up with, and
> > everything should all behave consistently with that timeout.  Thanks,
> >
> > Josef
> 
> To summarize this thread so far, there are 2 open questions:
> 1) should interrupted requests be automatically aborted/cancelled by
> default (even if no timeout is set)?
> 2) should background requests also have some timeout enforced on them?

Yes I think background requests should have a timeout enforced on them if it's
set.  Page writeout is actually one of the bigger problems because stuff will
just hang forever, like if you hit sync or something (which a lot of
applications do).

For #1 I have to think some more and look at what the mechanics/expectations of
those requests are, but if it's a thing you can leave for a follup them that
sounds good.

Additionally I think leaving the extra page copy thing as future work once this
work is done is the best bet.

Miklos are you around?  We've had a few different patches/discussions going on.
I assume you are/have been on summer vacation.  Thanks,

Josef