On Mon, Jun 8, 2020 at 7:50 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Jun 08, 2020 at 05:19:43PM +0200, Jan Kara wrote: > > > This is showing that the latencies are improved by roughly 2-9%. The > > > variability is not shown but some of these results are within the noise > > > as this workload heavily overloads the machine. That said, the system CPU > > > usage is reduced by quite a bit so it makes sense to avoid the overhead > > > even if it is a bit tricky to detect at times. A perf profile of just 1 > > > group of tasks showed that 5.14% of samples taken were in either fsnotify() > > > or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, > > > mostly function entry and the initial check for watchers. The check for > > > watchers is complicated enough that inlining it may be controversial. > > > > > > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > > > > Thanks for the patch! I have to tell I'm surprised this small reordering > > helps so much. For pipe inode we will bail on: > > > > if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && > > (!mnt || !mnt->mnt_fsnotify_marks)) > > return 0; > > > > So what we save with the reordering is sb->s_fsnotify_mask and > > mnt->mnt_fsnotify_mask fetch but that should be the same cacheline as > > sb->s_fsnotify_marks and mnt->mnt_fsnotify_marks, respectively. > > It is likely that the contribution of that change is marginal relative > to the fsnotify_parent() call. I'll know by tomorrow morning at the latest. > > > We also > > save a function call of fsnotify_parent() but I would think that is very > > cheap (compared to the whole write path) as well. > > > > To be fair, it is cheap but with this particular workload, we call > vfs_write() a *lot* and the path is not that long so it builds up to 5% > of samples overall. Given that these were anonymous pipes, it surprised > me to see fsnotify at all which is why I took a closer look. > I should note that after: 7c49b8616460 fs/notify: optimize inotify/fsnotify code for unwatched files Which speaks of a similar workload, the code looked quite similar to your optimization. It was: 60f7ed8c7c4d fsnotify: send path type events to group with super block marks That started accessing ->x_fsnotify_mask before ->x_fsnotify_marks, although I still find it hard to believe that this makes a real difference. Thanks, Amir.