Re: [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 30 Mar 2020 13:50:35 +0200
SeongJae Park <sjpark@xxxxxxxxxx> wrote:

> From: SeongJae Park <sjpark@xxxxxxxxx>
> 
> DAMON[1] can be used as a primitive for data access awared memory management
> optimizations.  That said, users who want such optimizations should run DAMON,
> read the monitoring results, analyze it, plan a new memory management scheme,
> and apply the new scheme by themselves.  Such efforts will be inevitable for
> some complicated optimizations.
> 
> However, in many other cases, the users could simply want the system to apply a
> memory management action to a memory region of a specific size having a
> specific access frequency for a specific time.  For example, "page out a memory
> region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
> "Do not use THP for a memory region larger than 2 MiB rarely accessed for more
> than 1 seconds".
> 
> This RFC patchset makes DAMON to handle such data access monitoring-based
> operation schemes.  With this change, users can do the data access awared
> optimizations by simply specifying their schemes to DAMON.


Hi SeongJae,

I'm wondering if I'm misreading the results below or a data handling mixup
occured. See inline.

Thanks,

Jonathan

> 
> 
> Evaluations
> ===========
> 
> Setup
> -----
> 
> On my personal QEMU/KVM based virtual machine on an Intel i7 host machine
> running Ubuntu 18.04, I measure runtime and consumed system memory while
> running various realistic workloads with several configurations.  I use 13 and
> 12 workloads in PARSEC3[3] and SPLASH-2X[4] benchmark suites, respectively.  I
> personally use another wrapper scripts[5] for setup and run of the workloads.
> On top of this patchset, we also applied the DAMON-based operation schemes
> patchset[6] for this evaluation.
> 
> Measurement
> ~~~~~~~~~~~
> 
> For the measurement of the amount of consumed memory in system global scope, I
> drop caches before starting each of the workloads and monitor 'MemFree' in the
> '/proc/meminfo' file.  To make results more stable, I repeat the runs 5 times
> and average results.  You can get stdev, min, and max of the numbers among the
> repeated runs in appendix below.
> 
> Configurations
> ~~~~~~~~~~~~~~
> 
> The configurations I use are as below.
> 
> orig: Linux v5.5 with 'madvise' THP policy
> rec: 'orig' plus DAMON running with record feature
> thp: same with 'orig', but use 'always' THP policy
> ethp: 'orig' plus a DAMON operation scheme[6], 'efficient THP'
> prcl: 'orig' plus a DAMON operation scheme, 'proactive reclaim[7]'
> 
> I use 'rec' for measurement of DAMON overheads to target workloads and system
> memory.  The remaining configs including 'thp', 'ethp', and 'prcl' are for
> measurement of DAMON monitoring accuracy.
> 
> 'ethp' and 'prcl' is simple DAMON-based operation schemes developed for
> proof of concepts of DAMON.  'ethp' reduces memory space waste of THP by using
> DAMON for decision of promotions and demotion for huge pages, while 'prcl' is
> as similar as the original work.  Those are implemented as below:
> 
> # format: <min/max size> <min/max frequency (0-100)> <min/max age> <action>
> # ethp: Use huge pages if a region >2MB shows >5% access rate, use regular
> # pages if a region >2MB shows <5% access rate for >1 second
> 2M null    5 null    null null    hugepage
> 2M null    null 5    1s null      nohugepage
> 
> # prcl: If a region >4KB shows <5% access rate for >5 seconds, page out.
> 4K null    null 5    5s null      pageout
> 
> Note that both 'ethp' and 'prcl' are designed with my only straightforward
> intuition, because those are for only proof of concepts and monitoring accuracy
> of DAMON.  In other words, those are not for production.  For production use,
> those should be tuned more.
> 
> 
> [1] "Redis latency problems troubleshooting", https://redis.io/topics/latency
> [2] "Disable Transparent Huge Pages (THP)",
>     https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
> [3] "The PARSEC Becnhmark Suite", https://parsec.cs.princeton.edu/index.htm
> [4] "SPLASH-2x", https://parsec.cs.princeton.edu/parsec3-doc.htm#splash2x
> [5] "parsec3_on_ubuntu", https://github.com/sjp38/parsec3_on_ubuntu
> [6] "[RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation
>     Schemes",
>     https://lore.kernel.org/linux-mm/20200303121406.20954-1-sjpark@xxxxxxxxxx/
> [7] "Proactively reclaiming idle memory", https://lwn.net/Articles/787611/
> 
> 
> Results
> -------
> 
> Below two tables show the measurement results.  The runtimes are in seconds
> while the memory usages are in KiB.  Each configurations except 'orig' shows
> its overhead relative to 'orig' in percent within parenthesises.
> 
> runtime                 orig     rec      (overhead) thp      (overhead) ethp     (overhead) prcl     (overhead)
> parsec3/blackscholes    107.594  107.956  (0.34)     106.750  (-0.78)    107.672  (0.07)     111.916  (4.02)    
> parsec3/bodytrack       79.230   79.368   (0.17)     78.908   (-0.41)    79.705   (0.60)     80.423   (1.50)    
> parsec3/canneal         142.831  143.810  (0.69)     123.530  (-13.51)   133.778  (-6.34)    144.998  (1.52)    
> parsec3/dedup           11.986   11.959   (-0.23)    11.762   (-1.87)    12.028   (0.35)     13.313   (11.07)   
> parsec3/facesim         210.125  209.007  (-0.53)    205.226  (-2.33)    207.766  (-1.12)    209.815  (-0.15)   
> parsec3/ferret          191.601  191.177  (-0.22)    190.420  (-0.62)    191.775  (0.09)     192.638  (0.54)    
> parsec3/fluidanimate    212.735  212.970  (0.11)     209.151  (-1.68)    211.904  (-0.39)    218.573  (2.74)    
> parsec3/freqmine        291.225  290.873  (-0.12)    289.258  (-0.68)    289.884  (-0.46)    298.373  (2.45)    
> parsec3/raytrace        118.289  119.586  (1.10)     119.045  (0.64)     119.064  (0.66)     137.919  (16.60)   
> parsec3/streamcluster   323.565  328.168  (1.42)     279.565  (-13.60)   287.452  (-11.16)   333.244  (2.99)    
> parsec3/swaptions       155.140  155.473  (0.21)     153.816  (-0.85)    156.423  (0.83)     156.237  (0.71)    
> parsec3/vips            58.979   59.311   (0.56)     58.733   (-0.42)    59.005   (0.04)     61.062   (3.53)    
> parsec3/x264            70.539   68.413   (-3.01)    64.760   (-8.19)    67.180   (-4.76)    68.103   (-3.45)   
> splash2x/barnes         80.414   81.751   (1.66)     73.585   (-8.49)    80.232   (-0.23)    115.753  (43.95)   
> splash2x/fft            33.902   34.111   (0.62)     24.228   (-28.53)   29.926   (-11.73)   44.438   (31.08)   
> splash2x/lu_cb          85.556   86.001   (0.52)     84.538   (-1.19)    86.000   (0.52)     91.447   (6.89)    
> splash2x/lu_ncb         93.399   93.652   (0.27)     90.463   (-3.14)    94.008   (0.65)     93.901   (0.54)    
> splash2x/ocean_cp       45.253   45.191   (-0.14)    43.049   (-4.87)    44.022   (-2.72)    46.588   (2.95)    
> splash2x/ocean_ncp      86.927   87.065   (0.16)     50.747   (-41.62)   86.855   (-0.08)    199.553  (129.57)  
> splash2x/radiosity      91.433   91.511   (0.09)     90.626   (-0.88)    91.865   (0.47)     104.524  (14.32)   
> splash2x/radix          31.923   32.023   (0.31)     25.194   (-21.08)   32.035   (0.35)     39.231   (22.89)   
> splash2x/raytrace       84.367   84.677   (0.37)     82.417   (-2.31)    83.505   (-1.02)    84.857   (0.58)    
> splash2x/volrend        87.499   87.495   (-0.00)    86.775   (-0.83)    87.311   (-0.21)    87.511   (0.01)    
> splash2x/water_nsquared 236.397  236.759  (0.15)     219.902  (-6.98)    224.228  (-5.15)    238.562  (0.92)    
> splash2x/water_spatial  89.646   89.767   (0.14)     89.735   (0.10)     90.347   (0.78)     103.585  (15.55)   
> total                   3020.570 3028.080 (0.25)     2852.190 (-5.57)    2953.960 (-2.21)    3276.550 (8.47)    
> 
> 
> memused.avg             orig         rec          (overhead) thp          (overhead) ethp         (overhead) prcl         (overhead)
> parsec3/blackscholes    1785916.600  1834201.400  (2.70)     1826249.200  (2.26)     1828079.200  (2.36)     1712210.600  (-4.13)   
> parsec3/bodytrack       1415049.400  1434317.600  (1.36)     1423715.000  (0.61)     1430392.600  (1.08)     1435136.000  (1.42)    
> parsec3/canneal         1043489.800  1058617.600  (1.45)     1040484.600  (-0.29)    1048664.800  (0.50)     1050280.000  (0.65)    
> parsec3/dedup           2414453.200  2458493.200  (1.82)     2411379.400  (-0.13)    2400516.000  (-0.58)    2461120.800  (1.93)    
> parsec3/facesim         541597.200   550097.400   (1.57)     544364.600   (0.51)     553240.000   (2.15)     552316.400   (1.98)    
> parsec3/ferret          317986.600   332346.000   (4.52)     320218.000   (0.70)     331085.000   (4.12)     330895.200   (4.06)    
> parsec3/fluidanimate    576183.400   585442.000   (1.61)     577780.200   (0.28)     587703.400   (2.00)     506501.000   (-12.09)  
> parsec3/freqmine        990869.200   997817.000   (0.70)     990350.400   (-0.05)    997669.000   (0.69)     763325.800   (-22.96)  
> parsec3/raytrace        1748370.800  1757109.200  (0.50)     1746153.800  (-0.13)    1757830.400  (0.54)     1581455.800  (-9.55)   
> parsec3/streamcluster   121521.800   140452.400   (15.58)    129725.400   (6.75)     132266.000   (8.84)     130558.200   (7.44)    
> parsec3/swaptions       15592.400    29018.800    (86.11)    14765.800    (-5.30)    27260.200    (74.83)    26631.600    (70.80)   
> parsec3/vips            2957567.600  2967993.800  (0.35)     2956623.200  (-0.03)    2973062.600  (0.52)     2951402.000  (-0.21)   
> parsec3/x264            3169012.400  3175048.800  (0.19)     3190345.400  (0.67)     3189353.000  (0.64)     3172924.200  (0.12)    
> splash2x/barnes         1209066.000  1213125.400  (0.34)     1217261.400  (0.68)     1209661.600  (0.05)     921041.800   (-23.82)  
> splash2x/fft            9359313.200  9195213.000  (-1.75)    9377562.400  (0.19)     9050957.600  (-3.29)    9517977.000  (1.70)    
> splash2x/lu_cb          514966.200   522939.400   (1.55)     520870.400   (1.15)     522635.000   (1.49)     329933.600   (-35.93)  
> splash2x/lu_ncb         514180.400   525974.800   (2.29)     521420.200   (1.41)     521063.600   (1.34)     523557.000   (1.82)    
> splash2x/ocean_cp       3346493.400  3288078.000  (-1.75)    3382253.800  (1.07)     3289477.600  (-1.70)    3260810.400  (-2.56)   
> splash2x/ocean_ncp      3909966.400  3882968.800  (-0.69)    7037196.000  (79.98)    4046363.400  (3.49)     3471452.400  (-11.22)  
> splash2x/radiosity      1471119.400  1470626.800  (-0.03)    1482604.200  (0.78)     1472718.400  (0.11)     546893.600   (-62.82)  
> splash2x/radix          1748360.800  1729163.400  (-1.10)    1371463.200  (-21.56)   1701993.600  (-2.65)    1817519.600  (3.96)    
> splash2x/raytrace       46670.000    60172.200    (28.93)    51901.600    (11.21)    60782.600    (30.24)    52644.800    (12.80)   
> splash2x/volrend        150666.600   167444.200   (11.14)    151335.200   (0.44)     163345.000   (8.41)     162760.000   (8.03)    
> splash2x/water_nsquared 45720.200    59422.400    (29.97)    46031.000    (0.68)     61801.400    (35.17)    62627.000    (36.98)   
> splash2x/water_spatial  663052.200   672855.800   (1.48)     665787.600   (0.41)     674696.200   (1.76)     471052.600   (-28.96)  
> total                   40077300.000 40108900.000 (0.08)     42997900.000 (7.29)     40032700.000 (-0.11)    37813000.000 (-5.65)   
> 
> 
> DAMON Overheads
> ~~~~~~~~~~~~~~~
> 
> In total, DAMON recording feature incurs 0.25% runtime overhead (up to 1.66% in
> worst case with 'splash2x/barnes') and 0.08% memory space overhead.
> 
> For convenience test run of 'rec', I use a Python wrapper.  The wrapper
> constantly consumes about 10-15MB of memory.  This becomes high memory overhead
> if the target workload has small memory footprint.  In detail, 16%, 86%, 29%,
> 11%, and 30% overheads shown for parsec3/streamcluster (125 MiB),
> parsec3/swaptions (15 MiB), splash2x/raytrace (45 MiB), splash2x/volrend (151
> MiB), and splash2x/water_nsquared (46 MiB)).  Nonetheless, the overheads are
> not from DAMON, but from the wrapper, and thus should be ignored.  This fake
> memory overhead continues in 'ethp' and 'prcl', as those configurations are
> also using the Python wrapper.
> 
> 
> Efficient THP
> ~~~~~~~~~~~~~
> 
> THP 'always' enabled policy achieves 5.57% speedup but incurs 7.29% memory
> overhead.  It achieves 41.62% speedup in best case, but 79.98% memory overhead
> in worst case.  Interestingly, both the best and worst case are with
> 'splash2x/ocean_ncp').

The results above don't seems to support this any more? 

> runtime                 orig     rec      (overhead) thp      (overhead) ethp     (overhead) prcl     (overhead)
> splash2x/ocean_ncp      86.927   87.065   (0.16)     50.747   (-41.62)   86.855   (-0.08)    199.553  (129.57) 




> 
> The 2-lines implementation of data access monitoring based THP version ('ethp')
> shows 2.21% speedup and -0.11% memory overhead.  In other words, 'ethp' removes
> 100% of THP memory waste while preserving 39.67% of THP speedup in total.
> 
> 
> Proactive Reclamation
> ~~~~~~~~~~~~~~~~~~~~
> 
> As same to the original work, I use 'zram' swap device for this configuration.
> 
> In total, our 1 line implementation of Proactive Reclamation, 'prcl', incurred
> 8.47% runtime overhead in total while achieving 5.65% system memory usage
> reduction.
> 
> Nonetheless, as the memory usage is calculated with 'MemFree' in
> '/proc/meminfo', it contains the SwapCached pages.  As the swapcached pages can
> be easily evicted, I also measured the residential set size of the workloads:
> 
> rss.avg                 orig         rec          (overhead) thp          (overhead) ethp         (overhead) prcl         (overhead)
> parsec3/blackscholes    592502.000   589764.400   (-0.46)    592132.600   (-0.06)    593702.000   (0.20)     406639.400   (-31.37)  
> parsec3/bodytrack       32365.400    32195.000    (-0.53)    32210.800    (-0.48)    32114.600    (-0.77)    21537.600    (-33.45)  
> parsec3/canneal         839904.200   840292.200   (0.05)     836866.400   (-0.36)    838263.200   (-0.20)    837895.800   (-0.24)   
> parsec3/dedup           1208337.200  1218465.600  (0.84)     1233278.600  (2.06)     1200490.200  (-0.65)    882911.400   (-26.93)  
> parsec3/facesim         311380.800   311363.600   (-0.01)    315642.600   (1.37)     312573.400   (0.38)     310257.400   (-0.36)   
> parsec3/ferret          99514.800    99542.000    (0.03)     100454.200   (0.94)     99879.800    (0.37)     89679.200    (-9.88)   
> parsec3/fluidanimate    531760.800   531735.200   (-0.00)    531865.400   (0.02)     531940.800   (0.03)     440781.000   (-17.11)  
> parsec3/freqmine        552455.400   552882.600   (0.08)     555793.600   (0.60)     553019.800   (0.10)     58067.000    (-89.49)  
> parsec3/raytrace        894798.400   894953.400   (0.02)     892223.400   (-0.29)    893012.400   (-0.20)    315259.800   (-64.77)  
> parsec3/streamcluster   110780.400   110856.800   (0.07)     110954.000   (0.16)     111310.800   (0.48)     108066.800   (-2.45)   
> parsec3/swaptions       5614.600     5645.600     (0.55)     5553.200     (-1.09)    5552.600     (-1.10)    3251.800     (-42.08)  
> parsec3/vips            31942.200    31752.800    (-0.59)    32042.600    (0.31)     32226.600    (0.89)     29012.200    (-9.17)   
> parsec3/x264            81770.800    81609.200    (-0.20)    82800.800    (1.26)     82612.200    (1.03)     81805.800    (0.04)    
> splash2x/barnes         1216515.600  1217113.800  (0.05)     1225605.600  (0.75)     1217325.000  (0.07)     540108.400   (-55.60)  
> splash2x/fft            9668660.600  9751350.800  (0.86)     9773806.400  (1.09)     9613555.400  (-0.57)    7951241.800  (-17.76)  
> splash2x/lu_cb          510368.800   510095.800   (-0.05)    514350.600   (0.78)     510276.000   (-0.02)    311584.800   (-38.95)  
> splash2x/lu_ncb         509904.800   510001.600   (0.02)     513847.000   (0.77)     510073.400   (0.03)     509905.600   (0.00)    
> splash2x/ocean_cp       3389550.600  3404466.000  (0.44)     3443363.600  (1.59)     3410388.000  (0.61)     3330608.600  (-1.74)   
> splash2x/ocean_ncp      3923723.200  3911148.200  (-0.32)    7175800.400  (82.88)    4104482.400  (4.61)     2030525.000  (-48.25)  
> splash2x/radiosity      1472994.600  1475946.400  (0.20)     1485636.800  (0.86)     1476193.000  (0.22)     262161.400   (-82.20)  
> splash2x/radix          1750329.800  1765697.000  (0.88)     1413304.000  (-19.25)   1754154.400  (0.22)     1516142.600  (-13.38)  
> splash2x/raytrace       23149.600    23208.000    (0.25)     28574.400    (23.43)    26694.600    (15.31)    16257.800    (-29.77)  
> splash2x/volrend        43968.800    43919.000    (-0.11)    44087.600    (0.27)     44224.000    (0.58)     32484.400    (-26.12)  
> splash2x/water_nsquared 29348.000    29338.400    (-0.03)    29604.600    (0.87)     29779.400    (1.47)     23644.800    (-19.43)  
> splash2x/water_spatial  655263.600   655097.800   (-0.03)    655199.200   (-0.01)    656282.400   (0.16)     379816.800   (-42.04)  
> total                   28486900.000 28598400.000 (0.39)     31625000.000 (11.02)    28640100.000 (0.54)     20489600.000 (-28.07)  
> 
> In total, 28.07% of residential sets were reduced.
> 
> With parsec3/freqmine, 'prcl' reduced 22.96% of system memory usage and 89.49%
> of residential sets while incurring only 2.45% runtime overhead.
> 
> 
> Sequence Of Patches
> ===================
> 
> The patches are based on the v5.6 plus v7 DAMON patchset[1] and Minchan's
> ``do_madvise()`` patch[2].  Minchan's patch was necessary for reuse of
> ``madvise()`` code in DAMON.  You can also clone the complete git tree:
> 
>     $ git clone git://github.com/sjp38/linux -b damos/rfc/v5
> 
> The web is also available:
> https://github.com/sjp38/linux/releases/tag/damos/rfc/v5
> 
> 
> [1] https://lore.kernel.org/linux-mm/20200318112722.30143-1-sjpark@xxxxxxxxxx/
> [2] https://lore.kernel.org/linux-mm/20200302193630.68771-2-minchan@xxxxxxxxxx/
> 
> The first patch allows DAMON to reuse ``madvise()`` code for the actions.  The
> second patch accounts age of each region.  The third patch implements the
> handling of the schemes in DAMON and exports a kernel space programming
> interface for it.  The fourth patch implements a debugfs interface for
> privileged people and programs.  The fifth and sixth patches each adds
> kunittests and selftests for these changes, and finally the seventhe patch
> modifies the user space tool for DAMON to support description and applying of
> schemes in human freiendly way.
> 
> 
> Patch History
> =============
> 
> Changes from RFC v4
> (https://lore.kernel.org/linux-mm/20200303121406.20954-1-sjpark@xxxxxxxxxx/)
>  - Handle CONFIG_ADVISE_SYSCALL
>  - Clean up code (Jonathan Cameron)
>  - Update test results
>  - Rebase on v5.6 + DAMON v7
> 
> Changes from RFC v3
> (https://lore.kernel.org/linux-mm/20200225102300.23895-1-sjpark@xxxxxxxxxx/)
>  - Add Reviewed-by from Brendan Higgins
>  - Code cleanup: Modularize madvise() call
>  - Fix a trivial bug in the wrapper python script
>  - Add more stable and detailed evaluation results with updated ETHP scheme
> 
> Changes from RFC v2
> (https://lore.kernel.org/linux-mm/20200218085309.18346-1-sjpark@xxxxxxxxxx/)
>  - Fix aging mechanism for more better 'old region' selection
>  - Add more kunittests and kselftests for this patchset
>  - Support more human friedly description and application of 'schemes'
> 
> Changes from RFC v1
> (https://lore.kernel.org/linux-mm/20200210150921.32482-1-sjpark@xxxxxxxxxx/)
>  - Properly adjust age accounting related properties after splitting, merging,
>    and action applying
> SeongJae Park (7):
>   mm/madvise: Export do_madvise() to external GPL modules
>   mm/damon: Account age of target regions
>   mm/damon: Implement data access monitoring-based operation schemes
>   mm/damon/schemes: Implement a debugfs interface
>   mm/damon-test: Add kunit test case for regions age accounting
>   mm/damon/selftests: Add 'schemes' debugfs tests
>   damon/tools: Support more human friendly 'schemes' control
> 
>  include/linux/damon.h                         |  29 ++
>  mm/damon-test.h                               |   5 +
>  mm/damon.c                                    | 428 +++++++++++++++++-
>  mm/madvise.c                                  |   1 +
>  tools/damon/_convert_damos.py                 | 125 +++++
>  tools/damon/_damon.py                         | 143 ++++++
>  tools/damon/damo                              |   7 +
>  tools/damon/record.py                         | 135 +-----
>  tools/damon/schemes.py                        | 105 +++++
>  .../testing/selftests/damon/debugfs_attrs.sh  |  29 ++
>  10 files changed, 878 insertions(+), 129 deletions(-)
>  create mode 100755 tools/damon/_convert_damos.py
>  create mode 100644 tools/damon/_damon.py
>  create mode 100644 tools/damon/schemes.py
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux