On Mon, Dec 14, 2020 at 03:37:46PM +0000, Matthew Wilcox wrote: > On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote: > > On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote: > > > If we need to iterate the list efficiently, i'd suggest getting rid of > > > the list and using an xarray instead. maybe a maple tree, once that code > > > is better exercised. > > > > Not really efficiently. We need just a full scan of it propagating the > > information about mapped and un-purged areas to user space applications. > > > > For example RCU-safe list is what we need, IMHO. From the other hand i > > am not sure if xarray is RCU safe in a context of concurrent removing/adding > > an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX(). > > It's as RCU safe as an RCU-safe list. Specifically, it guarantees: > > - If an element is present at all times between the start and the > end of the iteration, it will appear in the iteration. > - No element will appear more than once. > - No element will appear in the iteration that was never present. > - The iteration will terminate. > > If an element is added or removed between the start and end of the > iteration, it may or may not appear. Causality is not guaranteed (eg > if modification A is made before modification B, modification B may > be reflected in the iteration while modification A is not). > Thank you for information! To make use of xarray it would require a migration from our current vmap_area_root RB-tree to xaarray. It probably makes sense if there are performance benefits of such migration work. Apparently running the vmalloc benchmark shows a quite big degrade: # X-array urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1 Run the test with following parameters: run_test_mask=31 single_cpu_test=1 Done. Check the kernel ring buffer to see the summary. real 0m18.928s user 0m0.017s sys 0m0.004s urezki@pc638:~$ [ 90.103768] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1275773 usec [ 90.103771] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1439371 usec [ 90.103772] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 9138051 usec [ 90.103773] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 4821400 usec [ 90.103774] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 2181207 usec [ 90.103775] All test took CPU0=69774784667 cycles # RB-tree urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1 Run the test with following parameters: run_test_mask=31 single_cpu_test=1 Done. Check the kernel ring buffer to see the summary. real 0m13.975s user 0m0.013s sys 0m0.010s urezki@pc638:~$ [ 26.633372] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 429836 usec [ 26.633375] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 566042 usec [ 26.633377] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 7663974 usec [ 26.633378] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 3853388 usec [ 26.633379] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1370097 usec [ 26.633380] All test took CPU0=51370095742 cycles I suspect xa_load() does provide O(log(n)) search time? -- Vlad Rezki