Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 10-01-24 16:41:46, Amir Goldstein wrote:
> On Wed, Jan 10, 2024 at 3:56 PM Jan Kara <jack@xxxxxxx> wrote:
> > On Tue 09-01-24 21:48:18, Amir Goldstein wrote:
> > > Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> > > optimized the case where there are no fsnotify watchers on any of the
> > > filesystem's objects.
> > >
> > > It is quite common for a system to have a single local filesystem and
> > > it is quite common for the system to have some inotify watches on some
> > > config files or directories, so the optimization of no marks at all is
> > > often not in effect.
> >
> > I agree.
> >
> > > Access event watchers are far less common, so optimizing the case of
> > > no marks with access events could improve performance for more systems,
> > > especially for the performance sensitive hot io path.
> > >
> > > Maintain a per-sb counter of objects that have marks with access
> > > events in their mask and use that counter to optimize out the call to
> > > fsnotify() in fsnotify access hooks.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> >
> > I'm not saying no to this but let's discuss first before hacking in some
> > partial solution :). AFAIU what Jens sees is something similar as was
> > reported for example here [1]. In these cases we go through
> > fsnotify_parent() down to fsnotify() until the check:
> >
> >         if (!(test_mask & marks_mask))
> >                 return 0;
> >
> > there. And this is all relatively cheap (pure fetches like
> > sb->s_fsnotify_marks, mnt->mnt_fsnotify_marks, inode->i_fsnotify_marks,
> > sb->s_fsnotify_mask, mnt->mnt_fsnotify_mask, inode->i_fsnotify_mask) except
> > for parent handling in __fsnotify_parent(). That requires multiple atomic
> > operations - take_dentry_name_snapshot() is lock & unlock of d_lock, dget()
> > is cmpxchg on d_lockref, dput() is another cmpxchg on d_lockref - and this
> > gets really expensive, even more so if multiple threads race for the same
> > parent dentry.
> 
> Sorry, I forgot to link the Jens regression report [1] which included this
> partial perf report:
> 
>      3.36%             [kernel.vmlinux]  [k] fsnotify
>      2.32%             [kernel.vmlinux]  [k] __fsnotify_paren
> 
> Your analysis about __fsnotify_parent() may be correct, but what would be
> the explanation to time spent in fsnotify() in this report?

OK, I don't have a good explanation for why fsnotify() itself is so high in
Jens' profile. Maybe cacheline fetches caused by inode->i_fsnotify_mask and
inode->i_fsnotify_marks checks are causing the overhead but it would be
good to have it confirmed.

> In general, I think that previous optimization work as this commit by Mel:
> 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when
> there is no watcher") showed improvements when checks were inlined.

Well, I believe that commit helped exactly because the check for
DCACHE_FSNOTIFY_PARENT_WATCHED was moved ahead of all the work now in
__fsnotify_parent().

> [1] https://lore.kernel.org/linux-fsdevel/53682ece-f0e7-48de-9a1c-879ee34b0449@xxxxxxxxx/
> 
> > So I think what we ideally need is a way to avoid this expensive "stabilize
> > parent & its name" game unless we are pretty sure we are going to generate
> > the event. There's no way to avoid fetching the parent early if
> > dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED (we can still postpone the
> > take_dentry_name_snapshot() cost though). However for the case where
> > dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED == 0 (and this is the case
> > here AFAICT) we need the parent only after the check of 'test_mask &
> > marks_mask'. So in that case we could completely avoid the extra cost of
> > parent dentry handling.
> >
> > So in principle I believe we could make fsnotify noticeably lighter for the
> > case where no event is generated. Just the question is how to refactor the
> > current set of functions to achieve this without creating an unmaintainable
> > mess. I suspect if we lifted creation & some prefilling of iter_info into
> > __fsnotify_parent() and then fill in the parent in case we need it for
> > reporting only in fsnotify(), the code could be reasonably readable. We'd
> > need to always propagate the dentry down to fsnotify() though, currently we
> > often propagate only the inode because the dentry may be (still in case of
> > create or already in case of unlink) negative. Similarly we'd somehow need
> > to communicate down into fsnotify() whether the passed name needs
> > snapshotting or not...
> >
> > What do you think?
> 
> I am not saying no ;)
> but it sound a bit complicated so if the goal is to reduce the overhead
> of fsnotify_access() and fsnotify_perm(), which I don't think any application
> cares about, then I'd rather go with a much simpler solution even if it
> does not cover all the corner cases.

OK, let's figure out what exactly causes slowdown in Jens' case first. I
agree your solution helps mitigate the cost of fsnotify_access() for reads
but I forsee people complaining about fsnotify_modify() cost for writes in
short order :) and there it is not so simple to solve as there's likely
some watch for FS_MODIFY event somewhere.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux