On Mon, Nov 26, 2018 at 11:00:09PM -0800, Dan Williams wrote: > On Wed, Nov 14, 2018 at 2:53 PM Keith Busch <keith.busch@xxxxxxxxx> wrote: > > > > Heterogeneous memory systems provide memory nodes with latency > > and bandwidth performance attributes that are different from other > > nodes. Create an interface for the kernel to register these attributes > > under the node that provides the memory. If the system provides this > > information, applications can query the node attributes when deciding > > which node to request memory. > > > > When multiple memory initiators exist, accessing the same memory target > > from each may not perform the same as the other. The highest performing > > initiator to a given target is considered to be a local initiator for > > that target. The kernel provides performance attributes only for the > > local initiators. > > > > The memory's compute node should be symlinked in sysfs as one of the > > node's initiators. > > > > The following example shows the new sysfs hierarchy for a node exporting > > performance attributes: > > > > # tree /sys/devices/system/node/nodeY/initiator_access > > /sys/devices/system/node/nodeY/initiator_access > > |-- read_bandwidth > > |-- read_latency > > |-- write_bandwidth > > `-- write_latency > > With the expectation that there will be nodes that are initiator-only, > target-only, or both I think this interface should indicate that. The > 1:1 "local" designation of HMAT should not be directly encoded in the > interface, it's just a shortcut for finding at least one initiator in > the set that can realize the advertised performance. At least if the > interface can enumerate the set of initiators then it becomes clear > whether sysfs can answer a performance enumeration question or if the > application needs to consult an interface with specific knowledge of a > given initiator-target pairing. > > It seems a precursor to these patches is arranges for offline node > devices to be created for the ACPI proximity domains that are > offline-by default for reserved memory ranges. The intention is that all initiators symlinked to the memory node share the initiator_access attributes, as well as itself the node is its own initiator. There's no limit to how many the new kernel interface in patch 1/7 allows you to register, so it's not really a 1:1 relationship. Either instead or in addition to the symlinks, we can export a node_mask in the initiator_access directory for which these access attributes apply if that makes the intention more clear.