On 2014-11-26 3:00, Arnd Bergmann wrote: > On Tuesday 25 November 2014 08:15:47 Ganapatrao Kulkarni wrote: >>> No, don't hardcode ARM specifics into a common binding either. I've looked >>> at the ibm,associativity properties again, and I think we should just use >>> those, they can cover all cases and are completely independent of the >>> architecture. We should probably discuss about the property name though, >>> as using the "ibm," prefix might not be the best idea. >> >> We have started with new proposal, since not got enough details how >> ibm/ppc is managing the numa using dt. >> there is no documentation and there is no power/PAPR spec for numa in >> public domain and there are no single dt file in arch/powerpc which >> describes the numa. if we get any one of these details, we can align >> to powerpc implementation. > > Basically the idea is to have an "ibm,associativity" property in each > bus or device that is node specific, and this includes all CPUs and > memory nodes. The property contains an array of 32-bit integers that > count the resources. Take an example of a NUMA cluster of two machines > with four sockets and four cores each (32 cores total), a memory > channel on each socket and one PCI host per board that is connected > at equal speed to each socket on the board. > > The ibm,associativity property in each PCI host, CPU or memory device > node consequently has an array of three (board, socket, core) integers: > > memory@0,0 { > device_type = "memory"; > reg = <0x0 0x0 0x4 0x0; > /* board 0, socket 0, no specific core */ > ibm,asssociativity = <0 0 0xffff>; > }; > > memory@4,0 { > device_type = "memory"; > reg = <0x4 0x0 0x4 0x0>; > /* board 0, socket 1, no specific core */ > ibm,asssociativity = <0 1 0xffff>; > }; > > ... > > memory@1c,0 { > device_type = "memory"; > reg = <0x1c 0x0 0x4 0x0>; > /* board 0, socket 7, no specific core */ > ibm,asssociativity = <1 7 0xffff>; > }; > > cpus { > #address-cells = <2>; > #size-cells = <0>; > cpu@0 { > device_type = "cpu"; > reg = <0 0>; > /* board 0, socket 0, core 0*/ > ibm,asssociativity = <0 0 0>; > }; > > cpu@1 { > device_type = "cpu"; > reg = <0 0>; > /* board 0, socket 0, core 0*/ > ibm,asssociativity = <0 0 0>; > }; > > ... > > cpu@31 { > device_type = "cpu"; > reg = <0 32>; > /* board 1, socket 7, core 31*/ > ibm,asssociativity = <1 7 31>; > }; > }; > > pci@100,0 { > device_type = "pci"; > /* board 0 */ > ibm,associativity = <0 0xffff 0xffff>; > ... > }; > > pci@200,0 { > device_type = "pci"; > /* board 1 */ > ibm,associativity = <1 0xffff 0xffff>; > ... > }; > > ibm,associativity-reference-points = <0 1>; > > The "ibm,associativity-reference-points" property here indicates that index 2 > of each array is the most important NUMA boundary for the particular system, > because the performance impact of allocating memory on the remote board > is more significant than the impact of using memory on a remote socket of the > same board. Linux will consequently use the first field in the array as > the NUMA node ID. If the link between the boards however is relatively fast, > so you care mostly about allocating memory on the same socket, but going to > another board isn't much worse than going to another socket on the same > board, this would be > > ibm,associativity-reference-points = <1 0>; > > so Linux would ignore the board ID and use the socket ID as the NUMA node > number. The same would apply if you have only one (otherwise identical > board, then you would get > > ibm,associativity-reference-points = <1>; > > which means that index 0 is completely irrelevant for NUMA considerations > and you just care about the socket ID. In this case, devices on the PCI > bus would also not care about NUMA policy and just allocate buffers from > anywhere, while in original example Linux would allocate DMA buffers only > from the local board. Thanks for the detail information. I have the concerns about the distance for NUMA nodes, does the "ibm,associativity-reference-points" property can represent the distance between NUMA nodes? For example, a system with 4 sockets connected like below: Socket 0 <----> Socket 1 <----> Socket 2 <----> Socket 3 So from socket 0 to socket 1 (maybe on the same board), it just need 1 jump to access the memory, but from socket 0 to socket 2/3, it needs 2/3 jumps and the *distance* relative longer. Can "ibm,associativity-reference-points" property cover this? Thanks Hanjun -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html