On 12/19/2017 11:12 AM, Matthew Wilcox wrote:
I did some micro benchmarking when I was developing the code and did see performance gains -- see attached.On Tue, Dec 19, 2017 at 09:52:27AM -0800, rao.shoaib@xxxxxxxxxx wrote:This patch updates kfree_rcu to use new bulk memory free functions as they are more efficient. It also moves kfree_call_rcu() out of rcu related code to mm/slab_common.c Signed-off-by: Rao Shoaib <rao.shoaib@xxxxxxxxxx> --- include/linux/mm.h | 5 ++ kernel/rcu/tree.c | 14 ---- kernel/sysctl.c | 40 +++++++++++ mm/slab.h | 23 +++++++ mm/slab_common.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 264 insertions(+), 16 deletions(-)You've added an awful lot of code. Do you have any performance measurements that shows this to be a win?
I tried several networking benchmarks but was not able to get any improvement . The reason is that these benchmarks do not exercise the code we are improving. So I looked at the kernel source for users of kfree_rcu(). It turns out that directory deletion code calls kfree_rcu to free the data structure when an entry is deleted. Based on that I created two benchmarks.
1) make_dirs -- This benchmark creates multi level directory structure and than deletes it. It's the delete part where we see the performance gain of about 8.3%. The creation time remains same.
This benchmark was derived from fdtree benchmark at https://computing.llnl.gov/?set=code&page=sio_downloads ==> https://github.com/llnl/fdtree
2) tsock -- I also noticed that a socket has an entry in a directory and when the socket is closed the directory entry is deleted. So I wrote a simple benchmark that goes in a loop a million times and opens and closes 10 sockets per iteration. This shows an improvement of 7.6%
I have attached the benchmarks and results. Unchanged results are for stock kernel, Changed are for the modified kernel.
Shoaib
Attachment:
make_dirs.tar
Description: Unix tar archive
Attachment:
tsock.tar
Description: Unix tar archive