Hi Dave, On 2019/11/5 4:49, Dave Chinner wrote: > On Mon, Nov 04, 2019 at 07:29:40PM +0800, Shaokun Zhang wrote: >> From: Yang Guo <guoyang2@xxxxxxxxxx> >> >> percpu_counter_compare will be called by xfs_mod_icount/ifree to check >> whether the counter less than 0 and it is a expensive function. >> let's check it only when delta < 0, it will be good for xfs's performance. > > Hmmm. I don't recall this as being expensive. > Sorry about the misunderstanding information in commit message. > How did you find this? Can you please always document how you found If user creates million of files and the delete them, We found that the __percpu_counter_compare costed 5.78% CPU usage, you are right that itself is not expensive, but it calls __percpu_counter_sum which will use spin_lock and read other cpu's count. perf record -g is used to profile it: - 5.88% 0.02% rm [kernel.vmlinux] [k] xfs_mod_ifree - 5.86% xfs_mod_ifree - 5.78% __percpu_counter_compare 5.61% __percpu_counter_sum > the problem being addressed in the commit message so that we don't > then have to ask how the problem being fixed is reproduced. > >> Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> >> Signed-off-by: Yang Guo <guoyang2@xxxxxxxxxx> >> Signed-off-by: Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx> >> --- >> fs/xfs/xfs_mount.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c >> index ba5b6f3b2b88..5e8314e6565e 100644 >> --- a/fs/xfs/xfs_mount.c >> +++ b/fs/xfs/xfs_mount.c >> @@ -1174,6 +1174,9 @@ xfs_mod_icount( >> int64_t delta) >> { >> percpu_counter_add_batch(&mp->m_icount, delta, XFS_ICOUNT_BATCH); >> + if (delta > 0) >> + return 0; >> + >> if (__percpu_counter_compare(&mp->m_icount, 0, XFS_ICOUNT_BATCH) < 0) { >> ASSERT(0); >> percpu_counter_add(&mp->m_icount, -delta); > > I struggle to see how this is expensive when you have more than > num_online_cpus() * XFS_ICOUNT_BATCH inodes allocated. > __percpu_counter_compare() will always take the fast path so ends up > being very little code at all. > >> @@ -1188,6 +1191,9 @@ xfs_mod_ifree( >> int64_t delta) >> { >> percpu_counter_add(&mp->m_ifree, delta); >> + if (delta > 0) >> + return 0; >> + >> if (percpu_counter_compare(&mp->m_ifree, 0) < 0) { >> ASSERT(0); >> percpu_counter_add(&mp->m_ifree, -delta); > > This one might have some overhead because the count is often at or > around zero, but I haven't noticed it being expensive in kernel > profiles when creating/freeing hundreds of thousands of inodes every > second. > > IOWs, we typically measure the overhead of such functions by kernel > profile. Creating ~200,000 inodes a second, so hammering the icount > and ifree counters, I see: > > 0.16% [kernel] [k] percpu_counter_add_batch > 0.03% [kernel] [k] __percpu_counter_compare > 0.03% is just __percpu_counter_compare's usage. > Almost nothing - it's way down the long tail of noise in the > profile. > > IOWs, the CPU consumed by percpu_counter_compare() is low that > optimisation isn't going to produce any measurable performance > improvement. Hence it's not really something we've concerned > ourselves about. The profile is pretty much identical for removing > hundreds of thousands of files a second, too, so there really isn't > any performance gain to be had here. > > If you want to optimise code to make it faster and show a noticable > performance improvement, start by running kernel profiles while your > performance critical workload is running. Then look at what the > functions and call chains that consume the most CPU and work out how > to do them better. Those are the places that optimisation will > result in measurable performance gains.... Hmm, I have done it and I didn't describe this problem clearly, with this patch, 5.78%(__percpu_counter_compare) will disappear. I will follow your method and reduce unnecessary noise. Thanks, Shaokun > > Cheers, > > Dave. >