Re: RFC: Memory Tiering Kernel Interfaces (v2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/25/22 10:57 PM, Wei Xu wrote:
On Wed, May 25, 2022 at 3:01 AM Aneesh Kumar K V
<aneesh.kumar@xxxxxxxxxxxxx> wrote:

On 5/25/22 2:33 PM, Ying Huang wrote:
On Tue, 2022-05-24 at 22:32 -0700, Wei Xu wrote:
On Tue, May 24, 2022 at 1:24 AM Ying Huang <ying.huang@xxxxxxxxx> wrote:

On Tue, 2022-05-24 at 00:04 -0700, Wei Xu wrote:
On Thu, May 19, 2022 at 8:06 PM Ying Huang <ying.huang@xxxxxxxxx> wrote:


...


OK.  Just to confirm.  Does this mean that we will have fixed device ID,
for example,

GPU                   memtier255
DRAM (with CPU)               memtier0
PMEM                  memtier1

When we add a new memtier, it can be memtier254, or memter2?  The rank
value will determine the real demotion order.

I think you may need to send v3 to make sure everyone is at the same
page.


What we have implemented which we will send as RFC shortly is below.

cd /sys/dekvaneesh@ubuntu-guest:~$ cd /sys/devices/system/
kvaneesh@ubuntu-guest:/sys/devices/system$ pwd
/sys/devices/system
kvaneesh@ubuntu-guest:/sys/devices/system$ ls
clockevents  clocksource  container  cpu  edac  memory  memtier  mpic
node  power
kvaneesh@ubuntu-guest:/sys/devices/system$ cd memtier/
kvaneesh@ubuntu-guest:/sys/devices/system/memtier$ pwd
/sys/devices/system/memtier
kvaneesh@ubuntu-guest:/sys/devices/system/memtier$ ls
default_rank  max_rank  memtier1  power  uevent
kvaneesh@ubuntu-guest:/sys/devices/system/memtier$ cat default_rank
1
kvaneesh@ubuntu-guest:/sys/devices/system/memtier$ cat max_rank
3

For flexibility, we don't want max_rank to be interpreted as the
number of memory tiers.  Also, we want to leave spaces in rank values
to allow new memtiers to be inserted when needed.  So I'd suggest to
make max_rank a much larger value (e.g. 255).

kvaneesh@ubuntu-guest:/sys/devices/system/memtier$ cd memtier1/
kvaneesh@ubuntu-guest:/sys/devices/system/memtier/memtier1$ ls
nodelist  power  rank  subsystem  uevent
kvaneesh@ubuntu-guest:/sys/devices/system/memtier/memtier1$ cat nodelist
0-3
kvaneesh@ubuntu-guest:/sys/devices/system/memtier/memtier1$ cat rank
1
kvaneesh@ubuntu-guest:/sys/devices/system/memtier/memtier1$ cd
../../node/node1/
kvaneesh@ubuntu-guest:/sys/devices/system/node/node1$ cat memtier
1
kvaneesh@ubuntu-guest:/sys/devices/system/node/node1$
root@ubuntu-guest:/sys/devices/system/node/node1# echo 0 > memtier
root@ubuntu-guest:/sys/devices/system/node/node1# cat memtier
0
root@ubuntu-guest:/sys/devices/system/node/node1# cd ../../memtier/
root@ubuntu-guest:/sys/devices/system/memtier# ls
default_rank  max_rank  memtier0  memtier1  power  uevent
root@ubuntu-guest:/sys/devices/system/memtier# cd memtier0/
root@ubuntu-guest:/sys/devices/system/memtier/memtier0# cat nodelist
1
root@ubuntu-guest:/sys/devices/system/memtier/memtier0# cat rank
0

It looks like the example here demonstrates the dynamic creation of
memtier0.  If so, how is the rank of memtier0 determined?  If we want
to support creating new memtiers at runtime, I think an explicit
interface that specifies both device ID and rank is preferred to avoid
implicit dependencies between device IDs and ranks.


Right now to keep it all simpler there is a 1:1 relation ship between memory tier and rank value. ie.

memory tier  rank
memtier0     100
memtier1     200
memtier2     300

Currently we are limiting this to max 3 tiers. Hence the above is very easy. Once we really get dynamic tier creation, we should be looking at creating a new memory tier with highest possible rank value. Once we establish the memory tier, we then modify the rank value to a desired value. There will be a kernel interface to add a node to a memory tier with specific rank value so drivers can do that if required.

I haven't gone to that implementation because i was hoping we could get to that later when we really start requiring dynamic tier support.

I will share the patch series we have been working with. I am yet to get the documentation added. But then i will not wait for it to be complete so that we can get some early testing/feedback.

-aneesh




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux