On Mon, 11 Mar 2019 14:16:33 -0600 Keith Busch <kbusch@xxxxxxxxxx> wrote: > On Mon, Mar 11, 2019 at 04:38:43AM -0700, Jonathan Cameron wrote: > > On Wed, 27 Feb 2019 15:50:38 -0700 > > Keith Busch <keith.busch@xxxxxxxxx> wrote: > > > > > Platforms may provide system memory where some physical address ranges > > > perform differently than others, or is side cached by the system. > > The magic 'side cached' term still here in the patch description, ideally > > wants cleaning up. > > > > > > > > Add documentation describing a high level overview of such systems and the > > > perforamnce and caching attributes the kernel provides for applications > > performance > > > > > wishing to query this information. > > > > > > Reviewed-by: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > > Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > > > > A few comments inline. Mostly the weird corner cases that I miss understood > > in one of the earlier versions of the code. > > > > Whilst I think perhaps that one section could be tweaked a tiny bit I'm basically > > happy with this if you don't want to. > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > > > > --- > > > Documentation/admin-guide/mm/numaperf.rst | 164 ++++++++++++++++++++++++++++++ > > > 1 file changed, 164 insertions(+) > > > create mode 100644 Documentation/admin-guide/mm/numaperf.rst > > > > > > diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst > > > new file mode 100644 > > > index 000000000000..d32756b9be48 > > > --- /dev/null > > > +++ b/Documentation/admin-guide/mm/numaperf.rst > > > @@ -0,0 +1,164 @@ > > > +.. _numaperf: > > > + > > > +============= > > > +NUMA Locality > > > +============= > > > + > > > +Some platforms may have multiple types of memory attached to a compute > > > +node. These disparate memory ranges may share some characteristics, such > > > +as CPU cache coherence, but may have different performance. For example, > > > +different media types and buses affect bandwidth and latency. > > > + > > > +A system supports such heterogeneous memory by grouping each memory type > > > +under different domains, or "nodes", based on locality and performance > > > +characteristics. Some memory may share the same node as a CPU, and others > > > +are provided as memory only nodes. While memory only nodes do not provide > > > +CPUs, they may still be local to one or more compute nodes relative to > > > +other nodes. The following diagram shows one such example of two compute > > > +nodes with local memory and a memory only node for each of compute node: > > > + > > > + +------------------+ +------------------+ > > > + | Compute Node 0 +-----+ Compute Node 1 | > > > + | Local Node0 Mem | | Local Node1 Mem | > > > + +--------+---------+ +--------+---------+ > > > + | | > > > + +--------+---------+ +--------+---------+ > > > + | Slower Node2 Mem | | Slower Node3 Mem | > > > + +------------------+ +--------+---------+ > > > + > > > +A "memory initiator" is a node containing one or more devices such as > > > +CPUs or separate memory I/O devices that can initiate memory requests. > > > +A "memory target" is a node containing one or more physical address > > > +ranges accessible from one or more memory initiators. > > > + > > > +When multiple memory initiators exist, they may not all have the same > > > +performance when accessing a given memory target. Each initiator-target > > > +pair may be organized into different ranked access classes to represent > > > +this relationship. > > > > This concept is a bit vague at the moment. Largely because only access0 > > is actually defined. We should definitely keep a close eye on any others > > that are defined in future to make sure this text is still valid. > > > > I can certainly see it being used for different ideas of 'best' rather > > than simply best and second best etc. > > I tried to make the interface flexible to future extension, but I'm > still not sure how potential users would want to see something like > all pair-wise attributes, so I had some trouble trying to capture that > in words. Agreed, it is definitely non obvious. We might end up with something totally different like Jerome is proposing anyway. Let's address this when it happens! > > > > The highest performing initiator to a given target > > > +is considered to be one of that target's local initiators, and given > > > +the highest access class, 0. Any given target may have one or more > > > +local initiators, and any given initiator may have multiple local > > > +memory targets. > > > + > > > +To aid applications matching memory targets with their initiators, the > > > +kernel provides symlinks to each other. The following example lists the > > > +relationship for the access class "0" memory initiators and targets, which is > > > +the of nodes with the highest performing access relationship:: > > > + > > > + # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ > > > + relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY > > > > So this one perhaps needs a bit more description - I would put it after initiators > > which precisely fits the description you have here now. > > > > "targets contains those nodes for which this initiator is the best possible initiator." > > > > which is subtly different form > > > > "targets contains those nodes to which this node has the highest > > performing access characteristics." > > > > For example in my test case: > > * 4 nodes with local memory and cpu, 1 node remote and equal distant from all of the > > initiators, > > > > targets for the compute nodes contains both themselves and the remote node, to which > > the characteristics are of course worse. As you point out before, we need to look > > in > > node0/access0/targets/node0/access0/initiators > > node0/access0/targets/node4/access0/initiators > > to get the relevant characteristics and work out that node0 is 'nearer' itself > > (obviously this is a bit of a silly case, but we could have no memory node0 and > > be talking about node4 and node5. > > > > I am happy with the actual interface, this is just a question about whether we can tweak > > this text to be slightly clearer. > > Sure, I mention this in patch 4's commit message. Probably worth > repeating here: > > A memory initiator may have multiple memory targets in the same access > class. The target memory's initiators in a given class indicate the > nodes access characteristics share the same performance relative to other > linked initiator nodes. Each target within an initiator's access class, > though, do not necessarily perform the same as each other. That sounds good to me.