On Wed, Nov 13, 2019 at 9:49 AM Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> wrote: > > On Wed, 13 Nov 2019 21:57:24 +0800 > Tao Xu <tao3.xu@xxxxxxxxx> wrote: > > > On 11/13/2019 5:47 PM, Jonathan Cameron wrote: > > > On Tue, 12 Nov 2019 09:55:17 -0800 > > > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > >> [ add Tao Xu ] > > >> > > >> On Fri, Oct 4, 2019 at 4:45 AM Jonathan Cameron > > >> <Jonathan.Cameron@xxxxxxxxxx> wrote: > > >>> > > >>> Generic Initiators are a new ACPI concept that allows for the > > >>> description of proximity domains that contain a device which > > >>> performs memory access (such as a network card) but neither > > >>> host CPU nor Memory. > > >>> > > >>> This patch has the parsing code and provides the infrastructure > > >>> for an architecture to associate these new domains with their > > >>> nearest memory processing node. > > >> > > >> Thanks for this Jonathan. May I ask how this was tested? Tao has been > > >> working on qemu support for HMAT [1]. I have not checked if it already > > >> supports generic initiator entries, but it would be helpful to include > > >> an example of how the kernel sees these configurations in practice. > > >> > > >> [1]: http://patchwork.ozlabs.org/cover/1096737/ > > > > > > Tested against qemu with SRAT and SLIT table overrides from an > > > initrd to actually create the node and give it distances > > > (those all turn up correctly in the normal places). DSDT override > > > used to move an emulated network card into the GI numa node. That > > > currently requires the PCI patch referred to in the cover letter. > > > On arm64 tested both on qemu and real hardware (overrides on tables > > > even for real hardware as I can't persuade our BIOS team to implement > > > Generic Initiators until an OS is actually using them.) > > > > > > Main real requirement is memory allocations then occur from one of > > > the nodes at the minimal distance when you are do a devm_ allocation > > > from a device assigned. Also need to be able to query the distances > > > to allow load balancing etc. All that works as expected. > > > > > > It only has a fairly tangential connection to HMAT in that HMAT > > > can provide information on GI nodes. Given HMAT code is quite happy > > > with memoryless nodes anyway it should work. QEMU doesn't currently > > > have support to create GI SRAT entries let alone HMAT using them. > > > > > > Whilst I could look at adding such support to QEMU, it's not > > > exactly high priority to emulate something we can test easily > > > by overriding the tables before the kernel reads them. > > > > > > I'll look at how hard it is to build an HMAT tables for my test > > > configs based on the ones I used to test your HMAT patches a while > > > back. Should be easy if tedious. > > > > > > Jonathan > > > > > Indeed, HMAT can support Generic Initiator, but as far as I know, QEMU > > only can emulate a node with cpu and memory, or memory-only. Even if we > > assign a node with cpu only, qemu will raise error. Considering > > compatibility, there are lots of work to do for QEMU if we change NUMA > > or SRAT table. > > > > I faked up a quick HMAT table. > > Used a configuration with 3x CPU and memory nodes, 1x memory only node > and 1x GI node. Two test cases, one where the GI initiator is further than > the CPU containing nodes from the memory only node (realistic case for > existing hardware). That behaves as expected and there are no > /sys/node/bus/nodeX/access0 entries for the GI node > + appropriate ones for the memory only node as normal. > > The other case is more interesting we have the memory only node nearer > to the GI node than to any of the CPUs. In that case for x86 at least > the HMAT code is happy to put an access0 directory GI in the GI node > with empty access0/initiators and the memory node under access0/targets > > The memory only node is node4 and the GI node node3. > > So relevant dirs under /sys/bus/nodes/devices > > node3/access0/initators/ Empty > node3/access0/targets/node4 This makes sense node3 is an initiator, no other nodes can initiate to it. > node4/access0/initators/[node3 read_bandwidth write_bandwith etc] > node4/access0/targets/ Empty > > So the result current (I think - the HMAT interface still confuses > me :) is that a GI node is treated like a CPU node. This might mean > there is no useful information available if you want to figure out > which CPU containing node is nearest to Memory when the GI node is > nearer still. > > Is this a problem? I'm not sure... > > If we don't want to include GI nodes then we can possibly > use the node_state(N_CPU, x) method to check before considering > them, or I guess parse SRAT to extract that info directly. > > I tried this and it seems to work so can add patch doing this > next version if we think this is the 'right' thing to do. > > So what do you think 'should' happen? I think this might be our first case for adding an "access1" instance by default. I.e. in the case when access0 is not a cpu, then access1 is there to at least show the "local" cpu and let userspace see the performance difference of cpu vs a specific-initiator access.