On Thu, Sep 29, 2016 at 03:08:27PM +0200, Peter Zijlstra wrote: > > is not racy (the add_wait_queue() will now already guarantee that > > nobody else clears the bit). > > > > Hmm? > > Yes. I got my brain in a complete twist, but you're right, that is > indeed required. > > Here's a new version with hopefully clearer comments. > > Same caveat about 32bit, naming etc.. > I was able to run this with basic workloads over the weekend on small UMA machines. Both machines behaved similarly so I'm only reporting one from a single socket Skylake machine. NUMA machines rarely show anything much more interesting for these type of workloads but as always, the full impact is machine and workload dependant. Generally, I expect this type of patch to have marginal but detectable impact. This is a workload doing parallel dd of files large enough to trigger reclaim which locks/unlocks pages paralleldd 4.8.0-rc8 4.8.0-rc8 vanilla waitqueue-v1r2 Amean Elapsd-1 215.05 ( 0.00%) 214.53 ( 0.24%) Amean Elapsd-3 214.72 ( 0.00%) 214.42 ( 0.14%) Amean Elapsd-5 215.29 ( 0.00%) 214.88 ( 0.19%) Amean Elapsd-7 215.75 ( 0.00%) 214.79 ( 0.44%) Amean Elapsd-8 214.96 ( 0.00%) 215.21 ( -0.12%) That's basically within the noise. CPU usage overall looks like 4.8.0-rc8 4.8.0-rc8 vanillawaitqueue-v1r2 User 3409.66 3421.72 System 18298.66 18251.99 Elapsed 7178.82 7181.14 Marginal decrease in system CPU usage. Profiles showed the vanilla kernel spending less than 0.1% on unlock_page but it's eliminated by the patch. This is some microbenchmarks from the vm-scalability benchmark. It's similar to dd in that it triggers reclaim from a single thread vmscale 4.8.0-rc8 4.8.0-rc8 vanilla waitqueue-v1r2 Ops lru-file-mmap-read-elapsed 19.50 ( 0.00%) 19.43 ( 0.36%) Ops lru-file-readonce-elapsed 12.44 ( 0.00%) 12.29 ( 1.21%) Ops lru-file-readtwice-elapsed 22.27 ( 0.00%) 22.19 ( 0.36%) Ops lru-memcg-elapsed 12.18 ( 0.00%) 12.00 ( 1.48%) 4.8.0-rc8 4.8.0-rc8 vanillawaitqueue-v1r2 User 50.54 50.88 System 398.72 388.81 Elapsed 69.48 68.99 Again, differences are marginal but detectable. I accidentally did not collect profile data but I have no reason to believe it's significantly different to dd. This is "gitsource" from mmtests but it's a checkout of the git source tree and a run of make test which is where Linus first noticed the problem. The metric here is time-based, I don't actually check the results of the regression suite. gitsource 4.8.0-rc8 4.8.0-rc8 vanilla waitqueue-v1r2 User min 192.28 ( 0.00%) 192.49 ( -0.11%) User mean 193.55 ( 0.00%) 194.88 ( -0.69%) User stddev 1.52 ( 0.00%) 2.39 (-57.58%) User coeffvar 0.79 ( 0.00%) 1.23 (-56.51%) User max 196.34 ( 0.00%) 199.06 ( -1.39%) System min 122.70 ( 0.00%) 118.69 ( 3.27%) System mean 123.87 ( 0.00%) 120.68 ( 2.57%) System stddev 0.84 ( 0.00%) 1.65 (-97.67%) System coeffvar 0.67 ( 0.00%) 1.37 (-102.89%) System max 124.95 ( 0.00%) 123.14 ( 1.45%) Elapsed min 718.09 ( 0.00%) 711.48 ( 0.92%) Elapsed mean 724.23 ( 0.00%) 716.52 ( 1.07%) Elapsed stddev 4.20 ( 0.00%) 4.84 (-15.42%) Elapsed coeffvar 0.58 ( 0.00%) 0.68 (-16.66%) Elapsed max 730.51 ( 0.00%) 724.45 ( 0.83%) 4.8.0-rc8 4.8.0-rc8 vanillawaitqueue-v1r2 User 2730.60 2808.48 System 2184.85 2108.68 Elapsed 9938.01 9929.56 Overall, it's showing a drop in system CPU usage as expected. The detailed results show a drop of 2.57% in system CPU usage running the benchmark itself and 3.48% overall which is measuring everything and not just "make test". The drop in elapsed time is marginal but measurable. It may raise an eyebrow that the overall elapsed time doesn't match the detailed results. The detailed results report 5 iterations of "make test" without profiling enabled which takes takes about an hour. The way I configured it, the profiled run happened immediately after it and it's much slower as well as having to compile git itself which takes a few minutes. This is the top lock/unlock activity in the vanilla kernel 0.80% git [kernel.vmlinux] [k] unlock_page 0.28% sh [kernel.vmlinux] [k] unlock_page 0.20% git-rebase [kernel.vmlinux] [k] unlock_page 0.13% git [kernel.vmlinux] [k] lock_page_memcg 0.10% git [kernel.vmlinux] [k] unlock_page_memcg 0.07% git-submodule [kernel.vmlinux] [k] unlock_page 0.04% sh [kernel.vmlinux] [k] lock_page_memcg 0.03% git-rebase [kernel.vmlinux] [k] lock_page_memcg 0.03% sh [kernel.vmlinux] [k] unlock_page_memcg 0.03% sed [kernel.vmlinux] [k] unlock_page 0.03% perf [kernel.vmlinux] [k] unlock_page 0.02% git-rebase [kernel.vmlinux] [k] unlock_page_memcg 0.02% rm [kernel.vmlinux] [k] unlock_page 0.02% git-stash [kernel.vmlinux] [k] unlock_page 0.02% git-bisect [kernel.vmlinux] [k] unlock_page 0.02% diff [kernel.vmlinux] [k] unlock_page 0.02% cat [kernel.vmlinux] [k] unlock_page 0.02% wc [kernel.vmlinux] [k] unlock_page 0.01% mv [kernel.vmlinux] [k] unlock_page 0.01% git-submodule [kernel.vmlinux] [k] lock_page_memcg This is with the patch applied 0.49% git [kernel.vmlinux] [k] unlock_page 0.14% sh [kernel.vmlinux] [k] unlock_page 0.13% git [kernel.vmlinux] [k] lock_page_memcg 0.11% git-rebase [kernel.vmlinux] [k] unlock_page 0.10% git [kernel.vmlinux] [k] unlock_page_memcg 0.04% sh [kernel.vmlinux] [k] lock_page_memcg 0.04% git-submodule [kernel.vmlinux] [k] unlock_page 0.03% sh [kernel.vmlinux] [k] unlock_page_memcg 0.03% git-rebase [kernel.vmlinux] [k] lock_page_memcg 0.02% git-rebase [kernel.vmlinux] [k] unlock_page_memcg 0.02% sed [kernel.vmlinux] [k] unlock_page 0.01% rm [kernel.vmlinux] [k] unlock_page 0.01% git-stash [kernel.vmlinux] [k] unlock_page 0.01% git-submodule [kernel.vmlinux] [k] lock_page_memcg 0.01% git-bisect [kernel.vmlinux] [k] unlock_page 0.01% diff [kernel.vmlinux] [k] unlock_page 0.01% cat [kernel.vmlinux] [k] unlock_page 0.01% wc [kernel.vmlinux] [k] unlock_page 0.01% git-submodule [kernel.vmlinux] [k] unlock_page_memcg 0.01% mv [kernel.vmlinux] [k] unlock_page The drop in time spent by git in unlock_page is noticable. I did not drill down into the annotated profile but this roughly matches what I measured before when avoiding page_waitqueue lookups. The full profile is not exactly great but I didn't see anything in there I haven't seen before. Top entries with the patch applied looks like this 7.44% swapper [kernel.vmlinux] [k] intel_idle 1.25% git [kernel.vmlinux] [k] filemap_map_pages 1.08% git [kernel.vmlinux] [k] native_irq_return_iret 0.79% git [kernel.vmlinux] [k] unmap_page_range 0.56% git [kernel.vmlinux] [k] release_pages 0.51% git [kernel.vmlinux] [k] handle_mm_fault 0.49% git [kernel.vmlinux] [k] unlock_page 0.46% git [kernel.vmlinux] [k] page_remove_rmap 0.46% git [kernel.vmlinux] [k] _raw_spin_lock 0.42% git [kernel.vmlinux] [k] clear_page_c_e Lot of map/unmap activity like you'd expect and release_pages being a pig as usual. Overall, this patch shows similar behaviour to my own patch from 2014. There is a definite benefit but it's marginal. The big difference is that this patch is a lot similar than the 2014 version and may meet less resistance as a result. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>