Re: [PATCH v4 00/11] Multigenerational LRU Framework

bot@edi.works · Thu, 21 Oct 2021 12:41:03 -0700

Kernel / Memcached benchmark with MGLRU

TLDR
====
With the MGLRU, Memcached achieved 95% CIs [23.54, 32.25]%, [20.76,
41.61]%, [13.85, 15.97]%, [21.59, 30.02]% and [23.94, 29.92]% more
operations per second (OPS), respectively, for sequential access w/
THP=always, random access w/ THP=always, random access w/ THP=never,
Gaussian access w/ THP=always and Gaussian access w/ THP=never. There
were no statistically significant changes in OPS for sequential
access w/ THP=never.

Background
==========
Memory overcommit can increase utilization and, if carried out
properly, can also increase throughput. The challenges are to improve
working set estimation and to optimize page reclaim. The risks are
performance degradations and OOM kills. Short of overcoming the
challenges, the only way to reduce the risks is to underutilize
memory.

Memcached is one of the most popular open-source in-memory KV stores.
memtier_benchmark is the leading open-source KV store benchmarking
software that supports multiple access patterns. THP can have a
negative effect under memory pressure, due to internal and/or
external fragmentations.

Matrix
======
Kernels: version [+ patchset]
* Baseline: 5.14
* Patched: 5.14 + MGLRU

Memory conditions: % of memory size
* Underutilizing: N/A
* Overcommitting: ~10% swapped out (zram)

THP (2MB Transparent Huge Pages):
* Always
* Never

Read patterns (2kB objects):
* Parallel sequential
* Uniform random
* Gaussian (SD = 1/6 of key range)

Total configurations: 12
Data points per configuration: 10
Total run duration (minutes) per data point: ~20

Note that the goal of this benchmark is to compare the performance
for the same key range, object size, and hit ratio. Since Memcached
does not support backing storage, it requires fewer in-memory objects
to underutilize memory, which reduces the hit ratio and therefore is
not applicable in this case.

Procedure
=========
The latest MGLRU patchset for the 5.14 kernel is available at
git fetch https://linux-mm.googlesource.com/page-reclaim \
    refs/changes/30/1430/1

Baseline and patched 5.14 kernel images are available at
https://drive.google.com/drive/folders/1eMkQleAFGkP2vzM_JyRA21oKE0ESHBqp

<install and configure OS>

<for each kernel>
    grub2-set-default <baseline, patched>
    <for each THP setting>
        echo <always, never> > \
            /sys/kernel/mm/transparent_hugepage/enabled
        <update /etc/sysconfig/memcached>
        <for each access pattern>
            <update run_memtier.sh>
            <for each data point>
                reboot
                run_memtier.sh
                <collect OPS>

Hardware
========
Memory (GB): 64
CPU (total #): 32
NVMe SSD (GB): 1024

OS
==
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.4 (Ootpa)

$ cat /proc/swaps
Filename          Type          Size          Used     Priority
/dev/zram0        partition     8388604       0        -2

$ cat /proc/cmdline
<existing parameters> systemd.unified_cgroup_hierarchy=1

$ cat /sys/fs/cgroup/user.slice/memory.min
4294967296

$ cat /proc/sys/vm/overcommit_memory
1

Memcached
=========
$ memcached -V
memcached 1.5.22

$ cat /etc/sysconfig/memcached
USER="memcached"
MAXCONN="10000"
CACHESIZE="65536"
OPTIONS="-s /tmp/memcached.sock -a 0766 -t 16 -b 10000 -B binary <-L>"
memtier_benchmark
$ memtier_benchmark -v
memtier_benchmark 1.3.0
Copyright (C) 2011-2020 Redis Labs Ltd.
This is free software.  You may redistribute copies of it under the
terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

$ cat run_memtier.sh
# load objects
memtier_benchmark -S /tmp/memcached.sock -P memcache_binary -n
allkeys -c 1 -t 16 --ratio 1:0 --pipeline 1 -d 2000 --key-minimum=1
--key-maximum=30000000 --key-pattern=P:P

# run benchmark
memtier_benchmark -S /tmp/memcached.sock -P memcache_binary -n
30000000 -c 1 -t 16 --ratio 0:1 --pipeline 1 --randomize
--distinct-client-seed --key-minimum=1 --key-maximum=30000000
--key-pattern=<P:P, R:R, G:G>

Results
=======
Comparing the patched with the baseline kernel, Memcached achieved
95% CIs [23.54, 32.25]%, [20.76, 41.61]%, [13.85, 15.97]%, [21.59,
30.02]% and [23.94, 29.92]% more OPS, respectively, for sequential
access w/ THP=always, random access w/ THP=always, random access w/
THP=never, Gaussian access w/ THP=always and Gaussian access w/
THP=never. There were no statistically significant changes in OPS for
sequential access w/ THP=never.

+-------------------+-----------------------+------------------------+
| Mean OPS [95% CI] | THP=always            | THP=never              |
+-------------------+-----------------------+------------------------+
| Sequential access | 519599.7 / 664543.2   | 525394.8 / 527170.6    |
|                   | [122297.9, 167589.0]  | [-15138.63, 18690.31]  |
+-------------------+-----------------------+------------------------+
| Random access     | 450033.2 / 590360.7   | 509237.3 / 585142.4    |
|                   | [93415.59, 187239.37] | [70504.51, 81305.60]   |
+-------------------+-----------------------+------------------------+
| Gaussian access   | 481182.4 / 605358.7   | 531270.8 / 674341.4    |
|                   | [103892.6, 144460.0]] | [127199.8, 158941.2]   |
+-------------------+-----------------------+------------------------+
Table 1. Comparison between the baseline and patched kernels

Comparing THP=never with THP=always, Memcached achieved 95% CIs
[2.73, 23.58]% and [5.45, 15.37]% more OPS, respectively, for random
access and Gaussian access when using the baseline kernel; 95% CIs
[-22.65, -18.69]% and [10.67, 12.12]% more OPS, respectively, for
sequential access and Gaussian access when using the patched kernel.
There were no statistically significant changes in OPS under other
conditions.

+-------------------+-----------------------+------------------------+
| Mean OPS [95% CI] | Baseline kernel       |  Patched kernel        |
+-------------------+-----------------------+------------------------+
| Sequential access | 519599.7 / 525394.8   | 664543.2 / 527170.6    |
|                   | [-18739.71, 30329.80] | [-150551.0, -124194.1] |
+-------------------+-----------------------+------------------------+
| Random access     | 450033.2 / 509237.3   | 590360.7 / 585142.4    |
|                   | [12303.49, 106104.69] | [-10816.1516, 379.475] |
+-------------------+-----------------------+------------------------+
| Gaussian access   | 481182.4 / 531270.8   | 605358.7 / 674341.4    |
|                   | [26229.02, 73947.84]  | [64570.58, 73394.70]   |
+-------------------+-----------------------+------------------------+
Table 2. Comparison between THP=always and THP=never

Metrics collected during each run are available at
https://github.com/ediworks/KernelPerf/tree/master/mglru/memcached/5.14

References
==========
memtier_benchmark: A High-Throughput Benchmarking Tool for Redis &
Memcached
https://redis.com/blog/memtier_benchmark-a-high-throughput-benchmarking-tool-for-redis-memcached/

Appendix
========
$ cat raw_data.r
v <- c(
    # baseline THP=always sequential
    460266.29, 466497.70, 516145.38, 523474.39, 528507.72, 529481.86, 533867.92, 537028.56, 546027.45, 554699.89,
    # baseline THP=always random
    371470.66, 378967.63, 381137.01, 385205.60, 449100.72, 474670.76, 490470.46, 513341.53, 525159.49, 530808.55,
    # baseline THP=always Gaussian
    455674.14, 457089.50, 460001.46, 463269.94, 468283.00, 474169.61, 477684.67, 506331.96, 507875.30, 541444.54,
    # baseline THP=never sequential
    501887.04, 507303.10, 509573.54, 515222.79, 517429.04, 530805.74, 536490.44, 538088.45, 540459.92, 556687.57,
    # baseline THP=never random
    496489.97, 506444.42, 508002.80, 508707.39, 509746.28, 511157.58, 511897.57, 511926.06, 512652.28, 515348.95,
    # baseline THP=never Gaussian
    493199.15, 504207.48, 518781.40, 520536.21, 528619.45, 540677.91, 544365.57, 551698.32, 554046.80, 556576.14,
    # patched THP=always sequential
    660711.43, 660936.88, 661275.57, 662540.65, 663417.25, 665546.99, 665680.49, 667564.03, 668555.96, 669202.36,
    # patched THP=always random
    582574.69, 583714.04, 587102.54, 587375.85, 588997.85, 589052.96, 593922.17, 594722.98, 596178.28, 599965.83,
    # patched THP=always Gaussian
    601707.98, 602055.03, 603020.28, 603335.93, 604519.55, 605086.48, 607405.59, 607570.79, 609009.54, 609875.98,
    # patched THP=never sequential
    507753.56, 509462.65, 509964.30, 510369.66, 515001.36, 531685.00, 543709.22, 545142.98, 548392.56, 550224.74,
    # patched THP=never random
    571017.21, 579705.57, 582801.51, 584475.82, 586247.73, 587209.97, 587354.87, 588661.14, 591237.23, 592712.76,
    # patched THP=never Gaussian
    666403.77, 669691.68, 670248.43, 672190.97, 672466.43, 674320.42, 674897.72, 677282.76, 678886.51, 687024.85
)

a <- array(v, dim = c(10, 3, 2, 2))

# baseline vs patched
for (thp in 1:2) {
    for (pattern in 1:3) {
        r <- t.test(a[, pattern, thp, 1], a[, pattern, thp, 2])
        print(r)

        p <- r$conf.int * 100 / r$estimate[1]
        if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) {
            s <- sprintf("thp%d pattern%d: no significance", thp, pattern)
        } else {
            s <- sprintf("thp%d pattern%d: [%.2f, %.2f]%%", thp, pattern, -p[2], -p[1])
        }
        print(s)
    }
}

# THP=always vs THP=never
for (kernel in 1:2) {
    for (pattern in 1:3) {
        r <- t.test(a[, pattern, 1, kernel], a[, pattern, 2, kernel])
        print(r)

        p <- r$conf.int * 100 / r$estimate[1]
        if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) {
            s <- sprintf("kernel%d pattern%d: no significance", kernel, pattern)
        } else {
            s <- sprintf("kernel%d pattern%d: [%.2f, %.2f]%%", kernel, pattern, -p[2], -p[1])
        }
        print(s)
    }
}

$ R -q -s -f raw_data.r

        Welch Two Sample t-test

data:  a[, pattern, thp, 1] and a[, pattern, thp, 2]
t = -14.434, df = 9.1861, p-value = 1.269e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -167589.0 -122297.9
sample estimates:
mean of x mean of y
 519599.7  664543.2

[1] "thp1 pattern1: [23.54, 32.25]%"

        Welch Two Sample t-test

data:  a[, pattern, thp, 1] and a[, pattern, thp, 2]
t = -6.7518, df = 9.1333, p-value = 7.785e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -187239.37  -93415.59
sample estimates:
mean of x mean of y
 450033.2  590360.7

[1] "thp1 pattern2: [20.76, 41.61]%"

        Welch Two Sample t-test

data:  a[, pattern, thp, 1] and a[, pattern, thp, 2]
t = -13.805, df = 9.1933, p-value = 1.866e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -144460.0 -103892.6
sample estimates:
mean of x mean of y
 481182.4  605358.7

[1] "thp1 pattern3: [21.59, 30.02]%"

        Welch Two Sample t-test

data:  a[, pattern, thp, 1] and a[, pattern, thp, 2]
t = -0.22059, df = 17.979, p-value = 0.8279
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -18690.31  15138.63
sample estimates:
mean of x mean of y
 525394.8  527170.6

[1] "thp2 pattern1: no significance"

        Welch Two Sample t-test

data:  a[, pattern, thp, 1] and a[, pattern, thp, 2]
t = -29.606, df = 17.368, p-value = 2.611e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -81305.60 -70504.51
sample estimates:
mean of x mean of y
 509237.3  585142.4

[1] "thp2 pattern2: [13.85, 15.97]%"

        Welch Two Sample t-test

data:  a[, pattern, thp, 1] and a[, pattern, thp, 2]
t = -20.02, df = 10.251, p-value = 1.492e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -158941.2 -127199.8
sample estimates:
mean of x mean of y
 531270.8  674341.4

[1] "thp2 pattern3: [23.94, 29.92]%"

        Welch Two Sample t-test

data:  a[, pattern, 1, kernel] and a[, pattern, 2, kernel]
t = -0.50612, df = 14.14, p-value = 0.6206
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -30329.80  18739.71
sample estimates:
mean of x mean of y
 519599.7  525394.8

[1] "kernel1 pattern1: no significance"

        Welch Two Sample t-test

data:  a[, pattern, 1, kernel] and a[, pattern, 2, kernel]
t = -2.8503, df = 9.1116, p-value = 0.01885
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -106104.69  -12303.49
sample estimates:
mean of x mean of y
 450033.2  509237.3

[1] "kernel1 pattern2: [2.73, 23.58]%"

        Welch Two Sample t-test

data:  a[, pattern, 1, kernel] and a[, pattern, 2, kernel]
t = -4.4308, df = 16.918, p-value = 0.0003701
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -73947.84 -26229.02
sample estimates:
mean of x mean of y
 481182.4  531270.8

[1] "kernel1 pattern3: [5.45, 15.37]%"

        Welch Two Sample t-test

data:  a[, pattern, 1, kernel] and a[, pattern, 2, kernel]
t = 23.374, df = 9.5538, p-value = 9.402e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 124194.1 150551.0
sample estimates:
mean of x mean of y
 664543.2  527170.6

[1] "kernel2 pattern1: [-22.65, -18.69]%"

        Welch Two Sample t-test

data:  a[, pattern, 1, kernel] and a[, pattern, 2, kernel]
t = 1.96, df = 17.806, p-value = 0.06583
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -379.4756 10816.1516
sample estimates:
mean of x mean of y
 590360.7  585142.4

[1] "kernel2 pattern2: no significance"

        Welch Two Sample t-test

data:  a[, pattern, 1, kernel] and a[, pattern, 2, kernel]
t = -33.687, df = 13.354, p-value = 2.614e-14
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -73394.70 -64570.58
sample estimates:
mean of x mean of y
 605358.7  674341.4

[1] "kernel2 pattern3: [10.67, 12.12]%"