Am 2021-10-14 um 2:12 p.m. schrieb Jonathan Kim: > ROCr needs to be able to identify all devices that have direct access to > fine grain memory, which should include CPUs that are connected to GPUs > over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the > CPU is part of the hive. > > v2: fixup to ensure all numa nodes get the hive id mapped > > Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > index 98cca5f2b27f..9fda4ee03813 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > @@ -1296,6 +1296,26 @@ int kfd_topology_add_device(struct kfd_dev *gpu) > > proximity_domain = atomic_inc_return(&topology_crat_proximity_domain); > > + adev = (struct amdgpu_device *)(gpu->kgd); > + > + /* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */ > + if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) { > + int i; > + > + for (i = 0; i < proximity_domain; i++) { > + struct kfd_topology_device *to_dev = > + kfd_topology_device_by_proximity_domain(i); Sorry, one more nit-pick. This loop is pretty inefficient (0(n^2)) because kfd_topolody_device_by_proximity_domain does a linear search itself. It would be more efficient to just loop over the topology_device_list directly here (while holding the read lock): > down_read(&topology_lock); > > list_for_each_entry(top_dev, &topology_device_list, list) { > ... Regards, Felix > + > + if (!to_dev) > + continue; > + > + if (to_dev->gpu) > + break; > + > + to_dev->node_props.hive_id = gpu->hive_id; > + } > + } > + > /* Check to see if this gpu device exists in the topology_device_list. > * If so, assign the gpu to that device, > * else create a Virtual CRAT for this gpu device and then parse that > @@ -1457,7 +1477,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu) > dev->node_props.max_waves_per_simd = 10; > } > > - adev = (struct amdgpu_device *)(dev->gpu->kgd); > /* kfd only concerns sram ecc on GFX and HBM ecc on UMC */ > dev->node_props.capability |= > ((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?