Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop()

Uladzislau Rezki <urezki@xxxxxxxxx> · Mon, 14 Dec 2020 18:56:40 +0100

On Mon, Dec 14, 2020 at 03:37:46PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote:
> > On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote:
> > > If we need to iterate the list efficiently, i'd suggest getting rid of
> > > the list and using an xarray instead.  maybe a maple tree, once that code
> > > is better exercised.
> >
> > Not really efficiently. We need just a full scan of it propagating the
> > information about mapped and un-purged areas to user space applications.
> > 
> > For example RCU-safe list is what we need, IMHO. From the other hand i
> > am not sure if xarray is RCU safe in a context of concurrent removing/adding
> > an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX().
> 
> It's as RCU safe as an RCU-safe list.  Specifically, it guarantees:
> 
>  - If an element is present at all times between the start and the
>    end of the iteration, it will appear in the iteration.
>  - No element will appear more than once.
>  - No element will appear in the iteration that was never present.
>  - The iteration will terminate.
> 
> If an element is added or removed between the start and end of the
> iteration, it may or may not appear.  Causality is not guaranteed (eg
> if modification A is made before modification B, modification B may
> be reflected in the iteration while modification A is not).
>
Thank you for information! To make use of xarray it would require a migration
from our current vmap_area_root RB-tree to xaarray. It probably makes sense   
if there are performance benefits of such migration work. Apparently running
the vmalloc benchmark shows a quite big degrade:

# X-array
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1
Run the test with following parameters: run_test_mask=31 single_cpu_test=1
Done.
Check the kernel ring buffer to see the summary.

real    0m18.928s
user    0m0.017s
sys     0m0.004s
urezki@pc638:~$
[   90.103768] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1275773 usec
[   90.103771] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1439371 usec
[   90.103772] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 9138051 usec
[   90.103773] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 4821400 usec
[   90.103774] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 2181207 usec
[   90.103775] All test took CPU0=69774784667 cycles

# RB-tree
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1
Run the test with following parameters: run_test_mask=31 single_cpu_test=1
Done.
Check the kernel ring buffer to see the summary.

real    0m13.975s
user    0m0.013s
sys     0m0.010s
urezki@pc638:~$ 
[   26.633372] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 429836 usec
[   26.633375] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 566042 usec
[   26.633377] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 7663974 usec
[   26.633378] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 3853388 usec
[   26.633379] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1370097 usec
[   26.633380] All test took CPU0=51370095742 cycles

I suspect xa_load() does provide O(log(n)) search time?

--
Vlad Rezki