Gregory Price <gregory.price@xxxxxxxxxxxx> writes: > On Mon, Dec 11, 2023 at 01:53:40PM +0800, Huang, Ying wrote: >> Hi, Gregory, >> >> Thanks for updated version! >> >> Gregory Price <gourry.memverge@xxxxxxxxx> writes: >> >> > v2: >> > changes / adds: >> > - flattened weight matrix to an array at requested of Ying Huang >> > - Updated ABI docs per Davidlohr Bueso request >> > - change uapi structure to use aligned/fixed-length members as >> > Suggested-by: Arnd Bergmann <arnd@xxxxxxxx> >> > - Implemented weight fetch logic in get_mempolicy2 >> > - mbind2 was changed to take (iovec,len) as function arguments >> > rather than add them to the uapi structure, since they describe >> > where to apply the mempolicy - as opposed to being part of it. >> > >> > The sysfs structure is designed as follows. >> > >> > $ tree /sys/kernel/mm/mempolicy/ >> > /sys/kernel/mm/mempolicy/ >> > ├── possible_nodes >> > └── weighted_interleave >> > ├── nodeN >> > │ └── weight >> > └── nodeN+X >> > └── weight >> > >> > 'mempolicy' is added to '/sys/kernel/mm/' as a control group for >> > the mempolicy subsystem. >> >> Is it good to add 'mempolicy' in '/sys/kernel/mm/numa'? The advantage >> is that 'mempolicy' here is in fact "NUMA mempolicy". The disadvantage >> is one more directory nesting. I have no strong opinion here. >> > > i don't have a strong opinion here. > >> > 'possible_nodes' is added to 'mm/mempolicy' to help describe the >> > expected structures under mempolicy directorys. For example, >> > possible_nodes describes what nodeN directories wille exist under >> > the weighted_interleave directory. >> >> We have '/sys/devices/system/node/possible' already. Is this just a >> duplication? If so, why? And, the possible nodes can be gotten via >> contents of 'weighted_interleave' too. >> > > I'll remove it > >> And it appears not necessary to make 'weighted_interleave/nodeN' >> directory. Why not just make it a file. >> > > Originally I wasn't sure whether there would be more attributes, but > this is probably fine. I'll change it. > >> And, can we add a way to reset weight to the default value? For example >> `echo > nodeN/weight` or `echo > nodeN`. >> > > Seems reasonable. > >> > ===================================================================== >> > (Patches 7-10) set_mempolicy2, get_mempolicy2, mbind2 >> > >> > These interfaces are the 'extended' counterpart to their relatives. >> > They use the userland 'struct mpol_args' structure to communicate a >> > complete mempolicy configuration to the kernel. This structure >> > looks very much like the kernel-internal 'struct mempolicy_args': >> > >> > struct mpol_args { >> > /* Basic mempolicy settings */ >> > __u16 mode; >> > __u16 mode_flags; >> > __s32 home_node; >> > __aligned_u64 pol_nodes; >> > __u64 pol_maxnodes; >> > __u64 addr; >> > __s32 policy_node; >> > __s32 addr_node; >> > __aligned_u64 *il_weights; /* of size pol_maxnodes */ >> > }; >> >> This looks unnecessarily complex. I don't think that it's a good idea >> to use exact same parameter for all 3 syscalls. >> > > It is exactly as complex as mempolicy is. Everything here is already > described in the existing interfaces (except il_weights). > >> For example, can we use something as below? >> >> long set_mempolicy2(int mode, const unsigned long *nodemask, unsigned int *il_weights, >> unsigned long maxnode, unsigned long home_node, >> unsigned long flags); >> >> long mbind2(unsigned long start, unsigned long len, >> int mode, const unsigned long *nodemask, unsigned int *il_weights, >> unsigned long maxnode, unsigned long home_node, >> unsigned long flags); >> > > Your definition of mbind2 is impossible. > > Neither of these interfaces solve the extensibility issue. If a new > policy which requires a new format of data arrives, we can look forward > to set_mempolicy3 and mbind3. IIUC, we will not over-engineering too much. It's hard to predict the requirements in the future. >> A struct may be defined to hold mempolicy iteself. >> >> struct mpol { >> int mode; >> unsigned int home_node; >> const unsigned long *nodemask; >> unsigned int *il_weights; >> unsigned int maxnode; >> }; >> > > addr could be pulled out for get_mempolicy2, so i will do that > > 'addr_node' and 'policy_node' are warts that came from the original > get_mempolicy. Removing them increases the complexity of handling > arguments in the common get_mempolicy code. > > I could probably just drop support for retrieving the addr_node from > get_mempolicy2, since it's already possible with get_mempolicy. So I > will do that. If it's necessary, we can add another struct for get_mempolicy2(). But I don't think that it's necessary to add get_mempolicy2() specific parameters for set_mempolicy2() or mbind2(). -- Best Regards, Huang, Ying