On Mon, 2022-06-06 at 14:32 +0530, Aneesh Kumar K V wrote: > On 6/6/22 2:22 PM, Ying Huang wrote: > .... > > > > > I can move the patch "mm/demotion/dax/kmem: Set node's memory tier to > > > > > MEMORY_TIER_PMEM" before switching the demotion logic so that on systems > > > > > with two memory tiers (DRAM and pmem) the demotion continues to work > > > > > as expected after patch 3 ("mm/demotion: Build demotion targets based on > > > > > explicit memory tiers"). With that, there will not be any regression in > > > > > between the patch series. > > > > > > > > > > > > > Thanks! Please do that. And I think you can add sysfs interface after > > > > that patch too. That is, in [1/7] > > > > > > > > > > I am not sure why you insist on moving sysfs interfaces later. They are > > > introduced based on the helper added. It make patch review easier to > > > look at both the helpers and the user of the helper together in a patch. > > > > Yes. We should introduce a function and its user in one patch for > > review. But this doesn't mean that we should introduce the user space > > interface as the first step. I think the user space interface should > > output correct information when we expose it. > > > > If you look at this patchset we are not exposing any wrong information. > > patch 1 -> adds ability to register the memory tiers and expose details > of registered memory tier. At this point the patchset only support DRAM > tier and hence only one tier is shown But inside kernel, we actually work with 2 tiers and demote/prmote pages between them. With the information from your interface, users would think that there is no any demotion/promotion in kernel because there's only 1 tier. > patch 2 -> adds per node memtier attribute. So only DRAM nodes shows the > details, because the patchset yet has not introduced a slower memory > tier like PMEM. > > patch 4 -> introducing demotion. Will make that patch 5 > > patch 5 -> add dax kmem numa nodes as slower memory tier. Now this > becomes patch 4 at which point we will correctly show two memory tiers > in the system. > > > > > > +struct memory_tier { > > > > + nodemask_t nodelist; > > > > +}; > > > > > > > > And struct device can be added after the kernel has switched the > > > > implementation based on explicit memory tiers. > > > > > > > > +struct memory_tier { > > > > + struct device dev; > > > > + nodemask_t nodelist; > > > > +}; > > > > > > > > > > > > > Can you elaborate on this? or possibly review the v5 series indicating > > > what change you are suggesting here? > > > > > > > > > > But I don't think it's a good idea to have "struct device" embedded in > > > > "struct memory_tier". We don't have "struct device" embedded in "struct > > > > pgdata_list"... > > > > > > > > > > I avoided creating an array for memory_tier (memory_tier[]) so that we > > > can keep it dynamic. Keeping dev embedded in struct memory_tier simplify > > > the life cycle management of that dynamic list. We free the struct > > > memory_tier allocation via device release function (memtier->dev.release > > > = memory_tier_device_release ) > > > > > > Why do you think it is not a good idea? > > > > I think that we shouldn't bind our kernel internal implementation with > > user space interface too much. Yes. We can expose kernel internal > > implementation to user space in a direct way. I suggest you to follow > > the style of "struct pglist_data" and "struct node". If we decouple > > "struct memory_tier" and "struct memory_tier_dev" (or some other name), > > we can refer to "struct memory_tier" without depending on all device > > core. Memory tier should be accessible inside the kernel even without a > > user interface. And memory tier isn't a device in concept. > > > > memory_tiers are different from pglist_data and struct node in that we > also allow the creation of them from userspace. I don't think that there's much difference. struct pglist_data and struct node can be created/destroyed dynamically too. Please take a look at __try_online_node() register_one_node() try_offline_node() unregister_one_node() > That is the life time of > a memory tier is driven from userspace and it is much easier to manage > them via sysfs file lifetime mechanism rather than inventing an > independent and more complex way of doing the same. You needs to manage the lifetime of struct memory_tier in kernel too. Because there are kernel users. And even if you use device core lifetime mechanism, you don't need to embed struct device in struct memory_tier too, you can free "separate" struct memory_tier in "release" callback of struct device. > > For life cycle management, I think that we can do that without sysfs > > too. > > > > unless there are specific details that you think will be broken by > embedding struct device inside struct memory_tier, IMHO I still consider > the embedded implementation much simpler and in accordance with other > kernel design patterns. In concept, struct memory_tier isn't a device. Although we expose it as a device in sysfs. That's just an implementation detail. So I think it's better to make struct memory_tier independent of struct device if possible. Via not embeding struct device in struct memory_tier, it's much easier to dereference struct memory_tier directly in inline function in ".h". We don't need to introduce one accessor function for each field of struct memory_tier for that. Best Regards, Huang, Ying