Re: [PATCH] fuse: add a new "connections" file to show longest waiting reqeust

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 3, 2025 at 9:42 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Mon, 2025-02-03 at 09:31 -0800, Joanne Koong wrote:
> > On Mon, Feb 3, 2025 at 8:37 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > >
> > > Add a new file to the "connections" directory that shows how long (in
> > > seconds) the oldest fuse_req in the processing hash or pending queue has
> > > been waiting.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > > This is based on top of Joanne's work, as it requires the "create_time"
> > > field in fuse_req.  We have some internal detection of hung fuse server
> > > processes that relies on seeing elevated values in the "waiting" sysfs
> > > file. The problem with that method is that it can't detect when highly
> > > serialized workloads on a FUSE mount are hung. This adds another metric
> > > that we can use to detect when fuse mounts are hung.
> > > ---
> > >  fs/fuse/control.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/fuse/fuse_i.h  |  2 +-
> > >  2 files changed, 57 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/fuse/control.c b/fs/fuse/control.c
> > > index 2a730d88cc3bdb50ea1f8a3185faad5f05fc6e74..b213db11a2d7d85c4403baa61f9f7850fed150a8 100644
> > > --- a/fs/fuse/control.c
> > > +++ b/fs/fuse/control.c
> > > @@ -180,6 +180,55 @@ static ssize_t fuse_conn_congestion_threshold_write(struct file *file,
> > >         return ret;
> > >  }
> > >
> > > +/* Show how long (in s) the oldest request has been waiting */
> > > +static ssize_t fuse_conn_oldest_read(struct file *file, char __user *buf,
> > > +                                     size_t len, loff_t *ppos)
> > > +{
> > > +       char tmp[32];
> > > +       size_t size;
> > > +       unsigned long oldest = jiffies;
> > > +
> > > +       if (!*ppos) {
> > > +               struct fuse_conn *fc = fuse_ctl_file_conn_get(file);
> > > +               struct fuse_iqueue *fiq = &fc->iq;
> > > +               struct fuse_dev *fud;
> > > +               struct fuse_req *req;
> > > +
> > > +               if (!fc)
> > > +                       return 0;
> > > +
> > > +               spin_lock(&fc->lock);
> > > +               list_for_each_entry(fud, &fc->devices, entry) {
> > > +                       struct fuse_pqueue *fpq = &fud->pq;
> > > +                       int i;
> > > +
> > > +                       spin_lock(&fpq->lock);
> > > +                       for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> > > +                               if (list_empty(&fpq->processing[i]))
> > > +                                       continue;
> > > +                               /*
> > > +                                * Only check the first request in the queue. The
> > > +                                * assumption is that the one at the head of the list
> > > +                                * will always be the oldest.
> > > +                                */
> > > +                               req = list_first_entry(&fpq->processing[i], struct fuse_req, list);
> >
> > This probably doesn't matter in actuality, but maybe
> > list_first_entry_or_null() on fpq->processing[i] would be more optimal
> > here than "list_empty()" and "list_first_entry()" since that'll
> > minimize the number of READ_ONCE() calls we'd need to do.
> >
>
> I don't think the above will do more than one READ_ONCE, but I agree
> that list_first_entry_or_null() is more idiomatic. I'll switch to that.

Ah I just checked and you're right, it doesn't. I must have
misremembered this from the last time I looked at list.h

>
> > > +                               if (req->create_time < oldest)
> > > +                                       oldest = req->create_time;
> > > +                       }
> > > +                       spin_unlock(&fpq->lock);
> > > +               }
> > > +               if (!list_empty(&fiq->pending)) {
> >
> > I think we'll need to grab the fiq->lock here first before checking fiq->pending
> >
>
> Doh! Will fix.
>
> > > +                       req = list_first_entry(&fiq->pending, struct fuse_req, list);
> > > +                       if (req->create_time < oldest)
> > > +                               oldest = req->create_time;
> > > +               }
> > > +               spin_unlock(&fc->lock);
> > > +               fuse_conn_put(fc);
> > > +       }
> > > +       size = sprintf(tmp, "%ld\n", (jiffies - oldest)/HZ);
> >
> > If there are no requests, I think this will still return a non-zero
> > value since jiffies is a bit more than what the last "oldest =
> > jiffies" was, which might be confusing. Maybe we should just return 0
> > in this case?
> >
> >
>
> You should only see a non-zero value in that case if it takes more than
> a second to walk the hash. Possible, but pretty unlikely.
>
>
> > > +       return simple_read_from_buffer(buf, len, ppos, tmp, size);
> > > +}
> > > +
> > >  static const struct file_operations fuse_ctl_abort_ops = {
> > >         .open = nonseekable_open,
> > >         .write = fuse_conn_abort_write,
> > > @@ -202,6 +251,11 @@ static const struct file_operations fuse_conn_congestion_threshold_ops = {
> > >         .write = fuse_conn_congestion_threshold_write,
> > >  };
> > >
> > > +static const struct file_operations fuse_ctl_oldest_ops = {
> > > +       .open = nonseekable_open,
> > > +       .read = fuse_conn_oldest_read,
> > > +};
> > > +
> > >  static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
> > >                                           struct fuse_conn *fc,
> > >                                           const char *name,
> > > @@ -264,6 +318,8 @@ int fuse_ctl_add_conn(struct fuse_conn *fc)
> > >
> > >         if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1,
> > >                                  NULL, &fuse_ctl_waiting_ops) ||
> > > +           !fuse_ctl_add_dentry(parent, fc, "oldest", S_IFREG | 0400, 1,
> > > +                                NULL, &fuse_ctl_oldest_ops) ||
> > >             !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1,
> > >                                  NULL, &fuse_ctl_abort_ops) ||
> > >             !fuse_ctl_add_dentry(parent, fc, "max_background", S_IFREG | 0600,
> > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > index dcc1c327a0574b1fd1adda4b7ca047aa353b6a0a..b46c26bc977ad2d75d10fb306d3ecc4caf2c53bd 100644
> > > --- a/fs/fuse/fuse_i.h
> > > +++ b/fs/fuse/fuse_i.h
> > > @@ -42,7 +42,7 @@
> > >  #define FUSE_NAME_MAX 1024
> > >
> > >  /** Number of dentries for each connection in the control filesystem */
> > > -#define FUSE_CTL_NUM_DENTRIES 5
> > > +#define FUSE_CTL_NUM_DENTRIES 6
> > >
> > >  /* Frequency (in seconds) of request timeout checks, if opted into */
> > >  #define FUSE_TIMEOUT_TIMER_FREQ 15
> > >
> > > ---
> > > base-commit: 9afd7336f3acbe5678cca3b3bc5baefb51ce9564
> > > change-id: 20250203-fuse-sysfs-ce351d105cf0
> > >
> > > Best regards,
> > > --
> > > Jeff Layton <jlayton@xxxxxxxxxx>
> > >
> > >
>
> Thanks for the review!
> --
> Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux