Re: [PATCH v5 0/8] DAMON based tiered memory management for CXL memory

Honggyu Kim <honggyu.kim@xxxxxx> · Fri, 14 Jun 2024 12:05:51 +0900

Hi SeongJae,

On Thu, 13 Jun 2024 10:46:04 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote:
> Hi Honggyu,
> 
> On Thu, 13 Jun 2024 22:20:47 +0900 Honggyu Kim <honggyu.kim@xxxxxx> wrote:
> 
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> > 
> > It says there is no implementation of the demote/promote DAMOS action
> > are made.  This patch series is about its implementation for physical
> > address space so that this scheme can be applied in system wide level.
> > 
> > Changes from RFC v4:
> > https://lore.kernel.org/20240512175447.75943-1-sj@xxxxxxxxxx
> >   1. Add usage and design documents
> >   2. Rename alloc_demote_folio to alloc_migrate_folio
> >   3. Add evaluation results with "demotion_enabled" true
> >   4. Rebase based on v6.10-rc3
> 
> I left comments on the new patches for the documentation.
> 
> [...]
> > 
> > Evaluation Results
> > ==================
> > 
> > All the result values are normalized to DRAM-only execution time because
> > the workload cannot be faster than DRAM-only unless the workload hits
> > the peak bandwidth but our redis test doesn't go beyond the bandwidth
> > limit.
> > 
> > So the DRAM-only execution time is the ideal result without affected by
> > the gap between DRAM and CXL performance difference.  The NUMA node
> > environment is as follows.
> > 
> >   node0 - local DRAM, 512GB with a CPU socket (fast tier)
> >   node1 - disabled
> >   node2 - CXL DRAM, 96GB, no CPU attached (slow tier)
> > 
> > The following is the result of generating zipfian distribution to
> > redis-server and the numbers are averaged by 50 times of execution.
> > 
> >   1. YCSB zipfian distribution read only workload
> >   memory pressure with cold memory on node0 with 512GB of local DRAM.
> >   ====================+================================================+=========
> >                       |       cold memory occupied by mmap and memset  |
> >                       |   0G  440G  450G  460G  470G  480G  490G  500G |
> >   ====================+================================================+=========
> >   Execution time normalized to DRAM-only values                        | GEOMEAN
> >   --------------------+------------------------------------------------+---------
> >   DRAM-only           | 1.00     -     -     -     -     -     -     - | 1.00
> >   CXL-only            | 1.19     -     -     -     -     -     -     - | 1.19
> >   default             |    -  1.00  1.05  1.08  1.12  1.14  1.18  1.18 | 1.11
> >   DAMON tiered        |    -  1.03  1.03  1.03  1.03  1.03  1.07 *1.05 | 1.04
> >   DAMON lazy          |    -  1.04  1.03  1.04  1.05  1.06  1.06 *1.06 | 1.05
> >   ====================+================================================+=========
> >   CXL usage of redis-server in GB                                      | AVERAGE
> >   --------------------+------------------------------------------------+---------
> >   DRAM-only           |  0.0     -     -     -     -     -     -     - |  0.0
> >   CXL-only            | 51.4     -     -     -     -     -     -     - | 51.4
> >   default             |    -   0.6  10.6  20.5  30.5  40.5  47.6  50.4 | 28.7
> >   DAMON tiered        |    -   0.6   0.5   0.4   0.7   0.8   7.1   5.6 |  2.2
> >   DAMON lazy          |    -   0.5   3.0   4.5   5.4   6.4   9.4   9.1 |  5.5
> >   ====================+================================================+=========
> > 
> > Each test result is based on the exeuction environment as follows.
> 
> Nit.  s/exeuction/execution/

Thanks. Fixed it.

> [...]
> > In summary, the evaluation results show that DAMON memory management
> > with DAMOS_MIGRATE_{HOT,COLD} actions reduces the performance slowdown
> > compared to the "default" memory policy from 11% to 3~5% when the system
> > runs with high memory pressure on its fast tier DRAM nodes.
> > 
> > Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make
> > tiered memory systems run more efficiently under high memory pressures.
> 
> Thank you very much for continuing this great work.
> 
> Other than trivial comments on documentation patches and the above typo, I have
> no particular concern on this patchset.  I'm looking forward to the next
> version.

I have addressed all your comments and resent v6 again. Please have a
look again.
https://lore.kernel.org/20240614030010.751-1-honggyu.kim@xxxxxx

Thanks very much for your review!

Thanks,
Honggyu

> 
> Thanks,
> SJ
> [...]