On Mon, Nov 04, 2019 at 07:29:40PM +0800, Shaokun Zhang wrote: > From: Yang Guo <guoyang2@xxxxxxxxxx> > > percpu_counter_compare will be called by xfs_mod_icount/ifree to check > whether the counter less than 0 and it is a expensive function. > let's check it only when delta < 0, it will be good for xfs's performance. Hmmm. I don't recall this as being expensive. How did you find this? Can you please always document how you found the problem being addressed in the commit message so that we don't then have to ask how the problem being fixed is reproduced. > Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> > Signed-off-by: Yang Guo <guoyang2@xxxxxxxxxx> > Signed-off-by: Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx> > --- > fs/xfs/xfs_mount.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index ba5b6f3b2b88..5e8314e6565e 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -1174,6 +1174,9 @@ xfs_mod_icount( > int64_t delta) > { > percpu_counter_add_batch(&mp->m_icount, delta, XFS_ICOUNT_BATCH); > + if (delta > 0) > + return 0; > + > if (__percpu_counter_compare(&mp->m_icount, 0, XFS_ICOUNT_BATCH) < 0) { > ASSERT(0); > percpu_counter_add(&mp->m_icount, -delta); I struggle to see how this is expensive when you have more than num_online_cpus() * XFS_ICOUNT_BATCH inodes allocated. __percpu_counter_compare() will always take the fast path so ends up being very little code at all. > @@ -1188,6 +1191,9 @@ xfs_mod_ifree( > int64_t delta) > { > percpu_counter_add(&mp->m_ifree, delta); > + if (delta > 0) > + return 0; > + > if (percpu_counter_compare(&mp->m_ifree, 0) < 0) { > ASSERT(0); > percpu_counter_add(&mp->m_ifree, -delta); This one might have some overhead because the count is often at or around zero, but I haven't noticed it being expensive in kernel profiles when creating/freeing hundreds of thousands of inodes every second. IOWs, we typically measure the overhead of such functions by kernel profile. Creating ~200,000 inodes a second, so hammering the icount and ifree counters, I see: 0.16% [kernel] [k] percpu_counter_add_batch 0.03% [kernel] [k] __percpu_counter_compare Almost nothing - it's way down the long tail of noise in the profile. IOWs, the CPU consumed by percpu_counter_compare() is low that optimisation isn't going to produce any measurable performance improvement. Hence it's not really something we've concerned ourselves about. The profile is pretty much identical for removing hundreds of thousands of files a second, too, so there really isn't any performance gain to be had here. If you want to optimise code to make it faster and show a noticable performance improvement, start by running kernel profiles while your performance critical workload is running. Then look at what the functions and call chains that consume the most CPU and work out how to do them better. Those are the places that optimisation will result in measurable performance gains.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx