* Bagas Sanjaya <bagasdotme@xxxxxxxxx> [231123 00:07]: > On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote: > > Hi, > > > > Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them. > > > > Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression. This is almost certainly the maple tree. The tree is slower on writes than the rbtree and so if the benchmark mmaps/munmaps in a tight loop you will see this slow down. What you are doing is measuring the speed of inserting and removing a VMA with this benchmark, so it's not really something that happens - we usually use the mapping between adding and removing it. What this gains us is the ability to remove contention on the mmap lock during page faults. If you were to test contention around that lock, you will see a slowdown until you reach v6.4, where per-vma locking started to show up. More benchmarking will show different types of fault handling outside of the mmap lock until (I believe) 6.6, where most (or all?) types are supported. Although this is expected, I am still looking to reduce any real workloads that may suffer. I've been reducing the allocations, for example. > > > > The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call. Measured time of a sample run with different vanilla kernel versions are: > > k-5.10 and k-6.0: ~157us > > k-6.1: ~194us > > k-6.7: ~214us I would have expected v6.7 to remain closer to v6.1, but that may depend on the minor versions you have been testing and what fixes have landed there. > > Results are pretty consistent across multiple runs with a small percentage variance. Ftrace shows that latency of mmap_region(...) has increased since k-6.1. An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable. > > > > Please find the ftrace results and kernel config files in this folder: > > https://drive.google.com/drive/folders/1qy8YTBqxu8Gdbs7IigYbSd4FXldId5sd?usp=drive_link > > > > The test program can be found in here: > > https://drive.google.com/file/d/1tG6_BbQMCHwfKebvAIAg_xqbM_lpPcuM/view?usp=sharing > > > > Info on the testing environment: > > cpufreq_governor: performance > > Test machine: Raspberry Pi 4, 8GB DDR > > SCHED_FIFO with priority 99 for running the test program > > > > Vanilla kernels are not tainted. However on k-6.0 and k-6.7, I have to patch the drivers/clk/bcm/clk-raspberrypi.c file with the version in Raspberry Pi kernel tree for the CPU frequency governor to work. > > > > The next step is to find the commit that introduces your regression with > `git bisect`. If you haven't done so, see > Documentation/admin-guide/bug-bisect.rst for instructions. > > Anyway, I'm adding this regression to regzbot: > > #regzbot ^introduced: v6.0..v6.1 > > Thanks. > > -- > An old man doll... just what I always wanted! - Clara