On Thu, Apr 8, 2021 at 1:54 PM Roman Gushchin <guro@xxxxxx> wrote: > > On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote: > > Hello, > > > > I detected a performance degradation issue for a benchmark of PostgresSQL [1], > > and the issue seems to be related to object level memory cgroup [2]. > > I would appreciate it if you could give me some ideas to solve it. > > > > The benchmark shows the transaction per second (tps) and the tps for v5.9 > > and later kernel get about 10%-20% smaller than v5.8. > > > > The benchmark does sendto() and recvfrom() system calls repeatedly, > > and the duration of the system calls get longer than v5.8. > > The result of perf trace of the benchmark is as follows: > > > > - v5.8 > > > > syscall calls errors total min avg max stddev > > (msec) (msec) (msec) (msec) (%) > > --------------- -------- ------ -------- --------- --------- --------- ------ > > sendto 699574 0 2595.220 0.001 0.004 0.462 0.03% > > recvfrom 1391089 694427 2163.458 0.001 0.002 0.442 0.04% > > > > - v5.9 > > > > syscall calls errors total min avg max stddev > > (msec) (msec) (msec) (msec) (%) > > --------------- -------- ------ -------- --------- --------- --------- ------ > > sendto 699187 0 3316.948 0.002 0.005 0.044 0.02% > > recvfrom 1397042 698828 2464.995 0.001 0.002 0.025 0.04% > > > > - v5.12-rc6 > > > > syscall calls errors total min avg max stddev > > (msec) (msec) (msec) (msec) (%) > > --------------- -------- ------ -------- --------- --------- --------- ------ > > sendto 699445 0 3015.642 0.002 0.004 0.027 0.02% > > recvfrom 1395929 697909 2338.783 0.001 0.002 0.024 0.03% > > Can you please explain how to read these numbers? Or at least put a % regression. > > I bisected the kernel patches, then I found the patch series, which add > > object level memory cgroup support, causes the degradation. > > > > I confirmed the delay with a kernel module which just runs > > kmem_cache_alloc/kmem_cache_free as follows. The duration is about > > 2-3 times than v5.8. > > > > dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT); > > for (i = 0; i < 100000000; i++) > > { > > p = kmem_cache_alloc(dummy_cache, GFP_KERNEL); > > kmem_cache_free(dummy_cache, p); > > } > > > > It seems that the object accounting work in slab_pre_alloc_hook() and > > slab_post_alloc_hook() is the overhead. > > > > cgroup.nokmem kernel parameter doesn't work for my case because it disables > > all of kmem accounting. The patch is somewhat doing that i.e. disabling memcg accounting for slab. > > > > The degradation is gone when I apply a patch (at the bottom of this email) > > that adds a kernel parameter that expects to fallback to the page level > > accounting, however, I'm not sure it's a good approach though... > > Hello Masayoshi! > > Thank you for the report! > > It's not a secret that per-object accounting is more expensive than a per-page > allocation. I had micro-benchmark results similar to yours: accounted > allocations are about 2x slower. But in general it tends to not affect real > workloads, because the cost of allocations is still low and tends to be only > a small fraction of the whole cpu load. And because it brings up significant > benefits: 40%+ slab memory savings, less fragmentation, more stable workingset, > etc, real workloads tend to perform on pair or better. > > So my first question is if you see the regression in any real workload > or it's only about the benchmark? > > Second, I'll try to take a look into the benchmark to figure out why it's > affected so badly, but I'm not sure we can easily fix it. If you have any > ideas what kind of objects the benchmark is allocating in big numbers, > please let me know. > One idea would be to increase MEMCG_CHARGE_BATCH.