Re: [PATCH 0/9] Mitigate a vmap lock contention

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Tue, 23 May 2023 20:59:05 +0900

On Mon, May 22, 2023 at 01:08:40PM +0200, Uladzislau Rezki (Sony) wrote:
> Hello, folk.
> 
> 1. This is a followup of the vmap topic that was highlighted at the LSFMMBPF-2023
> conference. This small serial attempts to mitigate the contention across the
> vmap/vmalloc code. The problem is described here:
> 

Hello Uladzislau, thank you for the work!

> wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf

I ran the exactly same command but couldn't download the file, did I
miss something?

$ wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf
[...]
==> PASV ... done.    ==> RETR Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf ... 
No such file `Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf'.

> The material is tagged as a v2 version. It contains extra slides about testing
> the throughput, steps and comparison with a current approach.
> 
> 2. Motivation.
> 
> - The vmap code is not scalled to number of CPUs and this should be fixed;
> - XFS folk has complained several times that vmalloc might be contented on
>   their workloads:
> 
> <snip>
> commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> Author: Dave Chinner <dchinner@xxxxxxxxxx>
> Date:   Tue Jan 4 17:22:18 2022 -0800
> 
>     xfs: reduce kvmalloc overhead for CIL shadow buffers
>     
>     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
>     
>     The problem is when we are logging lots of large objects, we hit
>     kvmalloc really damn hard with costly order allocations, and
>     behaviour utterly sucks:

based on the commit I guess xfs should use vmalloc/kvmalloc is because
it allocates large buffers, how large could it be?

> 3. Test
> 
> On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> 
>     1-page     1-page-this-patch
> 1  0.576131   vs   0.555889
> 2   2.68376   vs    1.07895
> 3   4.26502   vs    1.01739
> 4   6.04306   vs    1.28924
> 5   8.04786   vs    1.57616
> 6   9.38844   vs    1.78142

<snip>

> 29    20.06   vs    3.59869
> 30  20.4353   vs     3.6991
> 31  20.9082   vs    3.73028
> 32  21.0865   vs    3.82904
> 
> 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> pair throughput.

I would be more interested in real numbers than synthetic benchmarks,
Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
with and without this patchset?

By the way looking at the commit, teaching __p?d_alloc() about gfp
context (that I'm _slowly_ working on...) could be nice for allowing
non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1]

Thanks!

[1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@xxxxxxxxxxxxxxxxxxxx

-- 
Hyeonggon Yoo

Doing kernel stuff as a hobby
Undergraduate | Chungnam National University
Dept. Computer Science & Engineering