[RFC] memory tiering: use small chunk size and more tiers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We need some way to override the system default memory tiers.  For
the example system as follows,

type		abstract distance
----		-----------------
HBM		300
DRAM		1000
CXL_MEM		5000
PMEM		5100

Given the memory tier chunk size is 100, the default memory tiers
could be,

tier		abstract distance	types
                range
----		-----------------       -----
3		300-400			HBM
10		1000-1100		DRAM
50		5000-5100		CXL_MEM
51		5100-5200		PMEM

If we want to group CXL MEM and PMEM into one tier, we have 2 choices.

1) Override the abstract distance of CXL_MEM or PMEM.  For example, if
we change the abstract distance of PMEM to 5050, the memory tiers
become,

tier		abstract distance	types
                range
----		-----------------       -----
3		300-400			HBM
10		1000-1100		DRAM
50		5000-5100		CXL_MEM, PMEM

2) Override the memory tier chunk size.  For example, if we change the
memory tier chunk size to 200, the memory tiers become,

tier		abstract distance	types
                range
----		-----------------       -----
1		200-400			HBM
5		1000-1200		DRAM
25		5000-5200		CXL_MEM, PMEM

But after some thoughts, I think choice 2) may be not good.  The
problem is that even if 2 abstract distances are almost same, they may
be put in 2 tier if they sit in the different sides of the tier
boundary.  For example, if the abstract distance of CXL_MEM is 4990,
while the abstract distance of PMEM is 5010.  Although the difference
of the abstract distances is only 20, CXL_MEM and PMEM will put in
different tiers if the tier chunk size is 50, 100, 200, 250, 500, ....
This makes choice 2) hard to be used, it may become tricky to find out
the appropriate tier chunk size that satisfying all requirements.

So I suggest to abandon choice 2) and use choice 1) only.  This makes
the overall design and user space interface to be simpler and easier
to be used.  The overall design of the abstract distance could be,

1. Use decimal for abstract distance and its chunk size.  This makes
   them more user friendly.

2. Make the tier chunk size as small as possible.  For example, 10.
   This will put different memory types in one memory tier only if their
   performance is almost same by default.  And we will not provide the
   interface to override the chunk size.

3. Make the abstract distance of normal DRAM large enough.  For
   example, 1000, then 100 tiers can be defined below DRAM, this is
   more than enough in practice.

4. If we want to override the default memory tiers, just override the
   abstract distances of some memory types with a per memory type
   interface.

This patch is to apply the design choices above in the existing code.

Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx>
Cc: Alistair Popple <apopple@xxxxxxxxxx>
Cc: Bharata B Rao <bharata@xxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Cc: Hesham Almatary <hesham.almatary@xxxxxxxxxx>
Cc: Jagdish Gediya <jvgediya.oss@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Tim Chen <tim.c.chen@xxxxxxxxx>
Cc: Wei Xu <weixugc@xxxxxxxxxx>
Cc: Yang Shi <shy828301@xxxxxxxxx>
---
 include/linux/memory-tiers.h | 7 +++----
 mm/memory-tiers.c            | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 965009aa01d7..2e39d9a6c8ce 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -7,17 +7,16 @@
 #include <linux/kref.h>
 #include <linux/mmzone.h>
 /*
- * Each tier cover a abstrace distance chunk size of 128
+ * Each tier cover a abstrace distance chunk size of 10
  */
-#define MEMTIER_CHUNK_BITS	7
-#define MEMTIER_CHUNK_SIZE	(1 << MEMTIER_CHUNK_BITS)
+#define MEMTIER_CHUNK_SIZE	10
 /*
  * Smaller abstract distance values imply faster (higher) memory tiers. Offset
  * the DRAM adistance so that we can accommodate devices with a slightly lower
  * adistance value (slightly faster) than default DRAM adistance to be part of
  * the same memory tier.
  */
-#define MEMTIER_ADISTANCE_DRAM	((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
+#define MEMTIER_ADISTANCE_DRAM	((100 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE / 2))
 #define MEMTIER_HOTPLUG_PRIO	100
 
 struct memory_tier;
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index fa8c9d07f9ce..e03011428fa5 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -165,11 +165,10 @@ static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memty
 	bool found_slot = false;
 	struct memory_tier *memtier, *new_memtier;
 	int adistance = memtype->adistance;
-	unsigned int memtier_adistance_chunk_size = MEMTIER_CHUNK_SIZE;
 
 	lockdep_assert_held_once(&memory_tier_lock);
 
-	adistance = round_down(adistance, memtier_adistance_chunk_size);
+	adistance = rounddown(adistance, MEMTIER_CHUNK_SIZE);
 	/*
 	 * If the memtype is already part of a memory tier,
 	 * just return that.
@@ -204,7 +203,7 @@ static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memty
 	else
 		list_add_tail(&new_memtier->list, &memory_tiers);
 
-	new_memtier->dev.id = adistance >> MEMTIER_CHUNK_BITS;
+	new_memtier->dev.id = adistance / MEMTIER_CHUNK_SIZE;
 	new_memtier->dev.bus = &memory_tier_subsys;
 	new_memtier->dev.release = memory_tier_device_release;
 	new_memtier->dev.groups = memtier_dev_groups;
@@ -641,7 +640,7 @@ static int __init memory_tier_init(void)
 #endif
 	mutex_lock(&memory_tier_lock);
 	/*
-	 * For now we can have 4 faster memory tiers with smaller adistance
+	 * For now we can have 100 faster memory tiers with smaller adistance
 	 * than default DRAM tier.
 	 */
 	default_dram_type = alloc_memory_type(MEMTIER_ADISTANCE_DRAM);
-- 
2.35.1





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux