On Mon, Dec 04, 2023 at 04:19:02PM +0800, Huang, Ying wrote: > Gregory Price <gregory.price@xxxxxxxxxxxx> writes: > > > If the structure is built as a matrix of (cpu_node,mem_nodes), > > the you can also optimize based on the node the task is running on. > > The matrix stuff makes the situation complex. If people do need > something like that, they can just use set_memorypolicy2() with user > specified weights. I still believe that "make simple stuff simple, and > complex stuff possible". > I don't think it's particularly complex, since we already have a distance matrix for numa nodes: available: 2 nodes (0-1) ... snip ... node distances: node 0 1 0: 10 21 1: 21 10 This would follow the same thing, just adjustable for bandwidth. I personally find the (src,dst) matrix very important for flexibility. But if there is particular pushback against it, having a one dimensional array is better than not having it, so I will take what I can get. > > That feels very intuitive, deals with many race condition issues, and > > the global setting can actually be implemented without the need for > > set_mempolicy2 at all - which is certainly a bonus. > > > > Would love more thoughts here. Will have a new RFC with set_mempolicy2, > > mbind2, and MPOL_WEIGHTED_INTERLEAVE soon that demonstrate the above. > > Thanks for doing all these! > Someone's got to :] ~Gregory