Re: [PATCH v7 2/3] fuse: add optional kernel-enforced timeout for requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Wed, 9 Oct 2024 17:44:55 -0700

On Wed, Oct 9, 2024 at 1:14 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Mon, 7 Oct 2024 at 20:43, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max timeout
> > due to how it's internally implemented.
>
> One thing I worry about is adding more roadblocks on the way to making
> request queuing more scalable.
>
> Currently there's fc->num_waiting that's touched on all requests and
> bg_queue/bg_lock that are touched on background requests.  We should
> be trying to fix these bottlenecks instead of adding more.
>
> Can't we use the existing lists to scan requests?

Hi Miklos,

The existing lists we have are:
* fiq pending list (per connection)
* fpq io list and fpq processing (for its associated hash) list (per fuse dev)
* bg queue (per connection)

If we wanted to reuse existing lists, we could do this in the timeout handler:
* grab the fiq lock, check the head entry of the pending list, release the lock
* grab the bg lock, check the head entry of the bg_queue, release the lock
* for each connection's fuse dev, grab the fpq lock, check the head
entry of the fpq->io list, iterate through the fpq->processing's lists
for 256 hashes and check against the head entry, release the lock

but some requests could slip through for the following cases:
-- resend:
* Request is on the resend's to_queue list when the timeout handler
check runs, in which case if that request is expired we won't get to
that until the next time the timeout handler kicks in
* A re-sent request may be moved to the head of the fiq->pending list,
but have a creation time newer than other entries on the fiq->pending
list , in which case we would not time out and abort the connection
when we should be doing so
-- transitioning between lists
* A request that is between lists (eg fpq->io and fpq->processing)
could be missed when the timeout handler check runs (but will probably
be caught the next time the timeout handler kicks in. We could also
modify the logic in dev_do_read to use list_move to avoid this case).

I think it's fine for these edge cases to slip through since most of
them will be caught eventually by the subsequent timeout handler runs,
but I was more worried about the increased lock contention while
iterating through all hashes of the fpq->processing list. But I think
for that we could just increase the timeout frequency to run less
frequently (eg once every 5 minutes instead of once every minute)

Do you think something like this sounds more reasonable?

Alternatively, I also still like the idea of something looser with
just periodically (according to whatever specified timeout) checking
if any requests are being serviced at all when fc->num-waiting is
non-zero. However, this would only protect against fully deadlocked
servers and miss malicious ones or half-deadlocked ones (eg
multithreaded fuse servers where only some threads are deadlocked).

Thanks,
Joanne
>
> It's more complex, obviously, but at least it doesn't introduce yet
> another per-fc list to worry about.
>
> Thanks,
> Miklos