Re: [RFC PATCH 1/4] mm/memory_hotplug: Add interface for runtime (de)configuration of memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03.12.24 15:33, Sumanth Korikkar wrote:
On Mon, Dec 02, 2024 at 05:55:19PM +0100, David Hildenbrand wrote:
Hi!

Not completely what I had in mind, especially not that we need something
that generic without any indication of ranges :)

In general, the flow is as follows:

1) Driver detects memory and adds it
2) Something auto-onlines that memory (e.g., udev rule)

For dax/kmem, 1) can be controlled using devdax, and usually it also tries
to take care of 2).

s390x standby storage really is the weird thing here, because it does 1) and
doesn't want 2). It shouldn't do 1) until a user wants to make use of
standby memory.

Hi David,

Hi,

sorry for the late reply. Cleaning up (some of) my inbox before Christmas, and I realized I skipped this mail.


The current rfc design doesnt do 1) until user initiates it.

The current rfc design considers the fact that there cannot be memory
holes, when there is a availability of standby memory. (which holds true
for both lpars and zvms)

With number of online and standby memory ranges count
(max_configurable), prototype lsmem/chmem could determine memory ranges
which are not yet configured
i.e. (configurable_memory = max_configurable - online ranges from sysfs
/sys/devices/system/memory/memory*).

Example prototype implementation of lsmem/chmem looks like:
./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP
RANGE                                 SIZE        STATE  BLOCK ALTMAP
0x0000000000000000-0x00000002ffffffff  12G       online   0-95      0
0x0000000300000000-0x00000003ffffffff   4G deconfigured 96-127      -

# Configure range with altmap
./chmem -c 0x0000000300000000-0x00000003ffffffff -a
./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP
RANGE                                 SIZE   STATE  BLOCK ALTMAP
0x0000000000000000-0x00000002ffffffff  12G  online   0-95      0
0x0000000300000000-0x00000003ffffffff   4G offline 96-127      1


# Online range
./chmem -e 0x0000000300000000-0x00000003ffffffff &&
./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP
RANGE                                 SIZE  STATE  BLOCK ALTMAP
0x0000000000000000-0x00000002ffffffff  12G online   0-95      0
0x0000000300000000-0x00000003ffffffff   4G online 96-127      1

Memory block size:       128M
Total online memory:      16G
Total offline memory:      0B
Total deconfigured:        0B

# offline range
./chmem -d 0x0000000300000000-0x00000003ffffffff &&
./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP
RANGE                                 SIZE   STATE  BLOCK ALTMAP
0x0000000000000000-0x00000002ffffffff  12G  online   0-95      0
0x0000000300000000-0x00000003ffffffff   4G offline 96-127      1

Memory block size:       128M
Total online memory:      12G
Total offline memory:      4G
Total deconfigured:        0B

# Defconfigure range.
./chmem -g 0x0000000300000000-0x00000003ffffffff &&
./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP
RANGE                                 SIZE        STATE  BLOCK ALTMAP
0x0000000000000000-0x00000002ffffffff  12G       online   0-95      0
0x0000000300000000-0x00000003ffffffff   4G deconfigured 96-127      -

Memory block size:       128M
Total online memory:      12G
Total offline memory:      0B
Total deconfigured:        4G

Maybe "standby memory" might make it clearer. The concept is s390x specific, and it will likely stay s390x specific.

I like the idea (frontend/tool interface), all we need is a way for these commands to detect ranges and turn them from standby into usable memory.


The user can still determine the available memory ranges and make them
configurable using tools like lsmem or chmem with this approach atleast
on s390 with this approach.

My thinking was that s390x would expose the standby memory ranges somewhere
arch specific in sysfs. From there, one could simply trigger the adding
(maybe specifying e.g, memmap_on_memory) of selected ranges.

As far as I understand, sysfs interface limits the size of the buffer
used in show() to 4kb.

sysfs want usually "one value per file".

When there are huge number of standby memory
ranges, wouldnt it be an issue to display everything in one attribute?

I was rather wondering about a syfs directory structure that exposes this information.

For example, in the granularity of storage increments we can enable/disable.

In general, it could be a similar structure as /sys/devices/system/memory/ (one director = one standby storage increment we can enable/disable?), but residing on the s390x specific sysfs area. Or any other way to express ranges that can be enabled/disabled as one unit.

I'm not sure if extending /sys/devices/system/memory/ itself would be a good idea, though. It all is very s390x specific.


Or use sysfs binary attributes to overcome the limitation?

Please correct me, If I am wrong.

Questions:
1. If we go ahead with this sysfs interface approach to list all standby
memory ranges, could the list be made available via
/sys/devices/system/memory/configurable_memlist?  This could be helpful,
as /sys/devices/system/memory/configure_memory performs architecture
independent checks and could also be useful for other architectures in
the future.

See above, I think we want this s390x specific.


2. Whether the new interface should also be compatible with lsmem/chmem?

Yes, likely we should allow them to query-configure this s390x specific thing.


3. OR can we have a s390 specific path (eg:
/sys/firmware/memory/standy_range) to list all standby memory range
which are in deconfigured state and also use the current design
(max_configurable) to make it easier for lsmem/chmem tool to detect
these standby memory ranges?

Ah, there it is, yes!


To disable standby memory, one would first offline the memory to then
trigger removal using the arch specific interface. It is very similar to
dax/kmem's way of handling offline+removal.

ok

Now I wonder if dax/kmem could be (ab)used on s390x for standby storage.
Likely a simple sysfs interface could be easier to implement.

I havent checked dax/kmem in detail yet. I will look into it.

Probably it's not 100% what you want to achieve, just to give you an example how similar (but different) technologies have solved this problem.

--
Cheers,

David / dhildenb





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux