> > I am not saying no ;) > > but it sound a bit complicated so if the goal is to reduce the overhead > > of fsnotify_access() and fsnotify_perm(), which I don't think any application > > cares about, then I'd rather go with a much simpler solution even if it > > does not cover all the corner cases. > > OK, let's figure out what exactly causes slowdown in Jens' case first. I > agree your solution helps mitigate the cost of fsnotify_access() for reads > but I forsee people complaining about fsnotify_modify() cost for writes in > short order :) and there it is not so simple to solve as there's likely > some watch for FS_MODIFY event somewhere. > Actually, I think we may be able to eat the cake and leave it whole. As I've written to you once in a different context, I think fsnotify has two mostly non-overlapping use cases: 1. Watch FS_ACCESS/FS_MODIFY on selective inodes (a.k.a tail) 2. Watch sb/mount/recursive subtree Anyone that will try watching FS_ACCESS/FS_MODIFY on a large data set, would find that to be way too noisy to be useful. For example, we have a filesystem monitor application which needs to know about changes to any file in the filesystem, but we cannot use FS_MODIFY for that monitor because it is too noisy, so we use FS_CLOSE_WRITE as second best. I guess we cannot rule out that the use case of watching direct children for FS_ACCESS/FS_MODIFY may exist in the wild. IOW, I think we should be able to optimize most of the access/modify hooks by checking those events specifically in the inline helpers for: 1. inode mask 2. parent watching children mask 3. sb->s_iflags & SB_I_FSNOTIFY_ACCESS_MONITOR The use of such an access monitor is currently unlikely, so I think tainting the sb is good enough for now. Note that the upcoming pre-content events (a.k.a HSM) are certainly going to fall under this "unlikely" category - If you'd want to use HSM (on-demand file content filling) on a filesystem, you will pay the performance penalty for some intensive io workloads unless you'd use an fd with FMODE_NONOTIFY. I think that is to be expected. HSM is anyway expected to incur extra performance penalty for sb_write_barrier() (for the pre-modify events) and in my wip sb_write_barrier branch [1], there is a similar optimization that does activate_sb_write_barrier() on the first pre-modify event watch and then leaves it activated on that sb forever. I will try to write this patch. Thanks, Amir. [1] https://github.com/amir73il/linux/commits/sb_write_barrier