On Wed, 20 Dec 2023 at 09:49, Zhao Chen <winters.zc@xxxxxxxxxxxx> wrote: > > When a FUSE daemon panics and failover, we aim to minimize the impact on > applications by reusing the existing FUSE connection. During this process, > another daemon is employed to preserve the FUSE connection's file > descriptor. The new started FUSE Daemon will takeover the fd and continue > to provide service. > > However, it is possible for some inflight requests to be lost and never > returned. As a result, applications awaiting replies would become stuck > forever. To address this, we can resend these pending requests to the > new started FUSE daemon. > > This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which > can trigger resending of the pending requests, ensuring they are properly > processed again. > > Signed-off-by: Zhao Chen <winters.zc@xxxxxxxxxxxx> > --- > fs/fuse/dev.c | 64 +++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/fuse.h | 1 + > 2 files changed, 65 insertions(+) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 1a8f82f478cb..a5a874b2f2e2 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -1775,6 +1775,67 @@ static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size, > return err; > } > > +/* > + * Resending all processing queue requests. > + * > + * During a FUSE daemon panics and failover, it is possible for some inflight > + * requests to be lost and never returned. As a result, applications awaiting > + * replies would become stuck forever. To address this, we can use notification > + * to trigger resending of these pending requests to the FUSE daemon, ensuring > + * they are properly processed again. > + * > + * Please note that this strategy is applicable only to idempotent requests or > + * if the FUSE daemon takes careful measures to avoid processing duplicated > + * non-idempotent requests. > + */ > +static void fuse_resend(struct fuse_conn *fc) > +{ > + struct fuse_dev *fud; > + struct fuse_req *req, *next; > + struct fuse_iqueue *fiq = &fc->iq; > + LIST_HEAD(to_queue); > + unsigned int i; > + > + spin_lock(&fc->lock); > + if (!fc->connected) { > + spin_unlock(&fc->lock); > + return; > + } > + > + list_for_each_entry(fud, &fc->devices, entry) { > + struct fuse_pqueue *fpq = &fud->pq; > + > + spin_lock(&fpq->lock); > + list_for_each_entry_safe(req, next, &fpq->io, list) { Handling of requests on fpq->io is tricky, since they are in the state of being read or written by the fuse server. Re-queuing it in this state likely can result in some sort of corruption. The simplest solution is to just ignore requests in the I/O state. Is this a good solution for your use case? Thanks, Miklos