Re: [PATCH v3 RESEND 1/2] fuse: Introduce a new notification type for resend pending requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2024/1/4 20:03, Miklos Szeredi 写道:
On Wed, 20 Dec 2023 at 09:49, Zhao Chen <winters.zc@xxxxxxxxxxxx> wrote:

When a FUSE daemon panics and failover, we aim to minimize the impact on
applications by reusing the existing FUSE connection. During this process,
another daemon is employed to preserve the FUSE connection's file
descriptor. The new started FUSE Daemon will takeover the fd and continue
to provide service.

However, it is possible for some inflight requests to be lost and never
returned. As a result, applications awaiting replies would become stuck
forever. To address this, we can resend these pending requests to the
new started FUSE daemon.

This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which
can trigger resending of the pending requests, ensuring they are properly
processed again.

Signed-off-by: Zhao Chen <winters.zc@xxxxxxxxxxxx>
---
  fs/fuse/dev.c             | 64 +++++++++++++++++++++++++++++++++++++++
  include/uapi/linux/fuse.h |  1 +
  2 files changed, 65 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1a8f82f478cb..a5a874b2f2e2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1775,6 +1775,67 @@ static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size,
         return err;
  }

+/*
+ * Resending all processing queue requests.
+ *
+ * During a FUSE daemon panics and failover, it is possible for some inflight
+ * requests to be lost and never returned. As a result, applications awaiting
+ * replies would become stuck forever. To address this, we can use notification
+ * to trigger resending of these pending requests to the FUSE daemon, ensuring
+ * they are properly processed again.
+ *
+ * Please note that this strategy is applicable only to idempotent requests or
+ * if the FUSE daemon takes careful measures to avoid processing duplicated
+ * non-idempotent requests.
+ */
+static void fuse_resend(struct fuse_conn *fc)
+{
+       struct fuse_dev *fud;
+       struct fuse_req *req, *next;
+       struct fuse_iqueue *fiq = &fc->iq;
+       LIST_HEAD(to_queue);
+       unsigned int i;
+
+       spin_lock(&fc->lock);
+       if (!fc->connected) {
+               spin_unlock(&fc->lock);
+               return;
+       }
+
+       list_for_each_entry(fud, &fc->devices, entry) {
+               struct fuse_pqueue *fpq = &fud->pq;
+
+               spin_lock(&fpq->lock);
+               list_for_each_entry_safe(req, next, &fpq->io, list) {

Handling of requests on fpq->io is tricky, since they are in the state
of being read or written by the fuse server.   Re-queuing it in this
state likely can result in some sort of corruption.

The simplest solution is to just ignore requests in the I/O state.  Is
this a good solution for your use case?

Thanks,
Miklos

Thank you for your insightful review!

The initial intention behind handling fpq->io was to ensure that we attempted to resend all the inflight requests. Upon reflection, I agree that only handling fpq->processing is sufficient for us. This solution is both reasonable and simple.

I will remove the code related to fpq->io in the upcoming patch.

Best regards,
Zhao Chen




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux