Re: [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher

Amir Goldstein <amir73il@xxxxxxxxx> · Sat, 4 Jul 2020 12:30:10 +0300

On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Fri 12-06-20 12:33:24, Amir Goldstein wrote:
> > From: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> >
> > The fsnotify paths are trivial to hit even when there are no watchers and
> > they are surprisingly expensive. For example, every successful vfs_write()
> > hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
> > FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
> > As it stands, fsnotify_parent is a guaranteed functional call even if there
> > are no watchers and fsnotify() does a substantial amount of unnecessary
> > work before it checks if there are any watchers. A perf profile showed
> > that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
> > total samples taken in that function during a test. This patch rearranges
> > the fast paths to reduce the amount of work done when there are no
> > watchers.
> >
> > The test motivating this was "perf bench sched messaging --pipe". Despite
> > the fact the pipes are anonymous, fsnotify is still called a lot and
> > the overhead is noticeable even though it's completely pointless. It's
> > likely the overhead is negligible for real IO so this is an extreme
> > example. This is a comparison of hackbench using processes and pipes on
> > a 1-socket machine with 8 CPU threads without fanotify watchers.
> >
> >                               5.7.0                  5.7.0
> >                             vanilla      fastfsnotify-v1r1
> > Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
> > Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
> > Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
> > Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
> > Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
> > Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
> > Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
> > Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
> > Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)
> >
> >                        5.7.0       5.7.0
> >                      vanilla fastfsnotify-v1r1
> > Duration User         157.05      152.79
> > Duration System      1279.98     1219.32
> > Duration Elapsed      182.81      174.52
> >
> > This is showing that the latencies are improved by roughly 2-9%. The
> > variability is not shown but some of these results are within the noise
> > as this workload heavily overloads the machine. That said, the system CPU
> > usage is reduced by quite a bit so it makes sense to avoid the overhead
> > even if it is a bit tricky to detect at times. A perf profile of just 1
> > group of tasks showed that 5.14% of samples taken were in either fsnotify()
> > or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
> > mostly function entry and the initial check for watchers.  The check for
> > watchers is complicated enough that inlining it may be controversial.
> >
> > [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask
> >
> > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > ---
> >  fs/notify/fsnotify.c             | 27 +++++++++++++++------------
> >  include/linux/fsnotify.h         | 10 ++++++++++
> >  include/linux/fsnotify_backend.h |  4 ++--
> >  3 files changed, 27 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index 72d332ce8e12..d59a58d10b84 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
> >  }
> >
> >  /* Notify this dentry's parent about a child's events. */
> > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> >                   int data_type)
> >  {
> >       struct dentry *parent;
>
> Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
> from here when it's moved to fsnotify_parent() inline helper?

No point.
It is making a comeback on:
 fsnotify: send event with parent/name info to sb/mount/non-dir marks

>
> > @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >       struct fsnotify_iter_info iter_info = {};
> >       struct super_block *sb = to_tell->i_sb;
> >       struct mount *mnt = NULL;
> > -     __u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
> >       int ret = 0;
> > -     __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > +     __u32 test_mask, marks_mask;
> >
> > -     if (path) {
> > +     if (path)
> >               mnt = real_mount(path->mnt);
> > -             mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
> > -     }
> > -     /* An event "on child" is not intended for a mount/sb mark */
> > -     if (mask & FS_EVENT_ON_CHILD)
> > -             mnt_or_sb_mask = 0;
> >
> >       /*
> >        * Optimization: srcu_read_lock() has a memory barrier which can
> > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >       if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
> >           (!mnt || !mnt->mnt_fsnotify_marks))
> >               return 0;
> > +
> > +     /* An event "on child" is not intended for a mount/sb mark */
> > +     marks_mask = to_tell->i_fsnotify_mask;
> > +     if (!(mask & FS_EVENT_ON_CHILD)) {
> > +             marks_mask |= sb->s_fsnotify_mask;
> > +             if (mnt)
> > +                     marks_mask |= mnt->mnt_fsnotify_mask;
> > +     }
> > +
> >       /*
> >        * if this is a modify event we may need to clear the ignored masks
> >        * otherwise return if neither the inode nor the vfsmount/sb care about
> >        * this type of event.
> >        */
> > -     if (!(mask & FS_MODIFY) &&
> > -         !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> > +     test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > +     if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
> >               return 0;
>
> Otherwise the patch looks good. One observation though: The (mask &
> FS_MODIFY) check means that all vfs_write() calls end up going through the
> "slower" path iterating all mark types and checking whether there are marks
> anyway. That could be relatively simply optimized using a hidden mask flag
> like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> needing special handling of FS_MODIFY... Not sure if we care enough at this
> point...

Yeh that sounds low hanging.
Actually, I Don't think we need to define a flag for that.
__fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.

I will take a look at that as part of FS_PRE_MODIFY work.
But in general, we should fight the urge to optimize theoretic
performance issues...

Thanks,
Amir.