Re: [PATCH v3 RESEND 1/2] fuse: Introduce a new notification type for resend pending requests

Miklos Szeredi <miklos@xxxxxxxxxx> · Thu, 4 Jan 2024 13:03:11 +0100

On Wed, 20 Dec 2023 at 09:49, Zhao Chen <winters.zc@xxxxxxxxxxxx> wrote:
>
> When a FUSE daemon panics and failover, we aim to minimize the impact on
> applications by reusing the existing FUSE connection. During this process,
> another daemon is employed to preserve the FUSE connection's file
> descriptor. The new started FUSE Daemon will takeover the fd and continue
> to provide service.
>
> However, it is possible for some inflight requests to be lost and never
> returned. As a result, applications awaiting replies would become stuck
> forever. To address this, we can resend these pending requests to the
> new started FUSE daemon.
>
> This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which
> can trigger resending of the pending requests, ensuring they are properly
> processed again.
>
> Signed-off-by: Zhao Chen <winters.zc@xxxxxxxxxxxx>
> ---
>  fs/fuse/dev.c             | 64 +++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/fuse.h |  1 +
>  2 files changed, 65 insertions(+)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 1a8f82f478cb..a5a874b2f2e2 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1775,6 +1775,67 @@ static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size,
>         return err;
>  }
>
> +/*
> + * Resending all processing queue requests.
> + *
> + * During a FUSE daemon panics and failover, it is possible for some inflight
> + * requests to be lost and never returned. As a result, applications awaiting
> + * replies would become stuck forever. To address this, we can use notification
> + * to trigger resending of these pending requests to the FUSE daemon, ensuring
> + * they are properly processed again.
> + *
> + * Please note that this strategy is applicable only to idempotent requests or
> + * if the FUSE daemon takes careful measures to avoid processing duplicated
> + * non-idempotent requests.
> + */
> +static void fuse_resend(struct fuse_conn *fc)
> +{
> +       struct fuse_dev *fud;
> +       struct fuse_req *req, *next;
> +       struct fuse_iqueue *fiq = &fc->iq;
> +       LIST_HEAD(to_queue);
> +       unsigned int i;
> +
> +       spin_lock(&fc->lock);
> +       if (!fc->connected) {
> +               spin_unlock(&fc->lock);
> +               return;
> +       }
> +
> +       list_for_each_entry(fud, &fc->devices, entry) {
> +               struct fuse_pqueue *fpq = &fud->pq;
> +
> +               spin_lock(&fpq->lock);
> +               list_for_each_entry_safe(req, next, &fpq->io, list) {

Handling of requests on fpq->io is tricky, since they are in the state
of being read or written by the fuse server.   Re-queuing it in this
state likely can result in some sort of corruption.

The simplest solution is to just ignore requests in the I/O state.  Is
this a good solution for your use case?

Thanks,
Miklos