On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > Otherwise the patch looks good. One observation though: The (mask & > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the > > > > "slower" path iterating all mark types and checking whether there are marks > > > > anyway. That could be relatively simply optimized using a hidden mask flag > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this > > > > point... > > > > > > Yeh that sounds low hanging. > > > Actually, I Don't think we need to define a flag for that. > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. > > > > Yes, that would be even more elegant. > > > > > I will take a look at that as part of FS_PRE_MODIFY work. > > > But in general, we should fight the urge to optimize theoretic > > > performance issues... > > > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a > > separate idea and without some numbers confirming my suspicion I don't > > think the complication is worth it so I don't want you to burn time on this > > unless you're really interested :). > > > > You know me ;-) > FS_MODIFY optimization pushed to fsnotify_pre_modify branch. > Only tested that LTP tests pass. > > Note that this is only expected to improve performance in case there *are* > marks, but not marks with ignore mask, because there is an earlier > optimization in fsnotify() for the no marks case. > Hi Mel, After following up on Jan's suggestion above, I realized there is another low hanging optimization we can make. As you may remember, one of the solutions we considered was to exclude special or internal sb's from notifications based on some SB flag, but making assumptions about which sb are expected to provide notifications turned out to be a risky game. We can however, keep a counter on sb to *know* there are no watches on any object in this sb, so the test: if (!sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && (!inode || !inode->i_fsnotify_marks)) return 0; Which is not so nice for inlining, can be summarized as: if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0) return 0; Which is nicer for inlining. I am not sure if you had a concrete reason for: "fs: Do not check if there is a fsnotify watcher on pseudo inodes" or if you did it for the sport. I have made the change above for the sport and for now I do not intend to post it for review unless a good reason comes up. If you are interested or curious to queue this code to Suse perf testing, I pushed it to branch fsnotify-perf [1]. It may be interesting to see if it won back the cpu cycles lost in the revert of your patch. This branch is based on some changes already in Jan's tree and some changes in my development tree (fsnotify_pre_modify), but you already fed this development branch to perf test machine once and reported back that there was no significant degradation. I can also provide the optimization patches based on Linus' tree if needed. Thanks, Amir. [1] https://github.com/amir73il/linux/commits/fsnotify-perf