Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 11 Jan 2024 08:27:08 +0200

> > I am not saying no ;)
> > but it sound a bit complicated so if the goal is to reduce the overhead
> > of fsnotify_access() and fsnotify_perm(), which I don't think any application
> > cares about, then I'd rather go with a much simpler solution even if it
> > does not cover all the corner cases.
>
> OK, let's figure out what exactly causes slowdown in Jens' case first. I
> agree your solution helps mitigate the cost of fsnotify_access() for reads
> but I forsee people complaining about fsnotify_modify() cost for writes in
> short order :) and there it is not so simple to solve as there's likely
> some watch for FS_MODIFY event somewhere.
>

Actually, I think we may be able to eat the cake and leave it whole.
As I've written to you once in a different context, I think fsnotify
has two mostly non-overlapping use cases:
1. Watch FS_ACCESS/FS_MODIFY on selective inodes (a.k.a tail)
2. Watch sb/mount/recursive subtree

Anyone that will try watching FS_ACCESS/FS_MODIFY on a large
data set, would find that to be way too noisy to be useful.

For example, we have a filesystem monitor application which needs to
know about changes to any file in the filesystem, but we cannot use
FS_MODIFY for that monitor because it is too noisy, so we use
FS_CLOSE_WRITE as second best.

I guess we cannot rule out that the use case of watching direct
children for FS_ACCESS/FS_MODIFY may exist in the wild.

IOW, I think we should be able to optimize most of the access/modify
hooks by checking those events specifically in the inline helpers for:
1. inode mask
2. parent watching children mask
3. sb->s_iflags & SB_I_FSNOTIFY_ACCESS_MONITOR

The use of such an access monitor is currently unlikely, so
I think tainting the sb is good enough for now.

Note that the upcoming pre-content events (a.k.a HSM) are
certainly going to fall under this "unlikely" category -
If you'd want to use HSM (on-demand file content filling) on
a filesystem, you will pay the performance penalty for some intensive
io workloads unless you'd use an fd with FMODE_NONOTIFY.
I think that is to be expected.

HSM is anyway expected to incur extra performance penalty for
sb_write_barrier() (for the pre-modify events) and in my wip
sb_write_barrier branch [1], there is a similar optimization that does
activate_sb_write_barrier() on the first pre-modify event watch
and then leaves it activated on that sb forever.

I will try to write this patch.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/sb_write_barrier