Wei Xu <weixugc@xxxxxxxxxx> writes: > On Thu, May 12, 2022 at 12:12 AM Aneesh Kumar K V > <aneesh.kumar@xxxxxxxxxxxxx> wrote: >> >> On 5/12/22 12:33 PM, ying.huang@xxxxxxxxx wrote: >> > On Wed, 2022-05-11 at 23:22 -0700, Wei Xu wrote: >> >> Sysfs Interfaces >> >> ================ >> >> >> >> * /sys/devices/system/memtier/memtierN/nodelist >> >> >> >> where N = 0, 1, 2 (the kernel supports only 3 tiers for now). >> >> >> >> Format: node_list >> >> >> >> Read-only. When read, list the memory nodes in the specified tier. >> >> >> >> Tier 0 is the highest tier, while tier 2 is the lowest tier. >> >> >> >> The absolute value of a tier id number has no specific meaning. >> >> What matters is the relative order of the tier id numbers. >> >> >> >> When a memory tier has no nodes, the kernel can hide its memtier >> >> sysfs files. >> >> >> >> * /sys/devices/system/node/nodeN/memtier >> >> >> >> where N = 0, 1, ... >> >> >> >> Format: int or empty >> >> >> >> When read, list the memory tier that the node belongs to. Its value >> >> is empty for a CPU-only NUMA node. >> >> >> >> When written, the kernel moves the node into the specified memory >> >> tier if the move is allowed. The tier assignment of all other nodes >> >> are not affected. >> >> >> >> Initially, we can make this interface read-only. >> > >> > It seems that "/sys/devices/system/node/nodeN/memtier" has all >> > information we needed. Do we really need >> > "/sys/devices/system/memtier/memtierN/nodelist"? >> > >> > That can be gotten via a simple shell command line, >> > >> > $ grep . /sys/devices/system/node/nodeN/memtier | sort -n -k 2 -t ':' >> > >> >> It will be really useful to fetch the memory tier node list in an easy >> fashion rather than reading multiple sysfs directories. If we don't have >> other attributes for memorytier, we could keep >> "/sys/devices/system/memtier/memtierN" a NUMA node list there by >> avoiding /sys/devices/system/memtier/memtierN/nodelist >> >> -aneesh > > It is harder to implement memtierN as just a file and doesn't follow > the existing sysfs pattern, either. Besides, it is extensible to have > memtierN as a directory. diff --git a/drivers/base/node.c b/drivers/base/node.c index 6248326f944d..251f38ec3816 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -1097,12 +1097,49 @@ static struct attribute *node_state_attrs[] = { NULL }; +#define MAX_TIER 3 +nodemask_t memory_tier[MAX_TIER]; + +#define _TIER_ATTR_RO(name, tier_index) \ + { __ATTR(name, 0444, show_tier, NULL), tier_index, NULL } + +struct memory_tier_attr { + struct device_attribute attr; + int tier_index; + int (*write)(nodemask_t nodes); +}; + +static ssize_t show_tier(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct memory_tier_attr *mt = container_of(attr, struct memory_tier_attr, attr); + + return sysfs_emit(buf, "%*pbl\n", + nodemask_pr_args(&memory_tier[mt->tier_index])); +} + static const struct attribute_group memory_root_attr_group = { .attrs = node_state_attrs, }; + +#define TOP_TIER 0 +static struct memory_tier_attr memory_tiers[] = { + [0] = _TIER_ATTR_RO(memory_top_tier, TOP_TIER), +}; + +static struct attribute *memory_tier_attrs[] = { + &memory_tiers[0].attr.attr, + NULL +}; + +static const struct attribute_group memory_tier_attr_group = { + .attrs = memory_tier_attrs, +}; + static const struct attribute_group *cpu_root_attr_groups[] = { &memory_root_attr_group, + &memory_tier_attr_group, NULL, }; As long as we have the ability to see the nodelist, I am good with the proposal. -aneesh