Am 2021-10-14 um 1:44 p.m. schrieb Jonathan Kim: > ROCr needs to be able to identify all devices that have direct access to > fine grain memory, which should include CPUs that are connected to GPUs > over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the > CPU is part of the hive. > > Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > index 98cca5f2b27f..d04c48dfd72b 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > @@ -1296,6 +1296,27 @@ int kfd_topology_add_device(struct kfd_dev *gpu) > > proximity_domain = atomic_inc_return(&topology_crat_proximity_domain); > > + adev = (struct amdgpu_device *)(gpu->kgd); > + > + /* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */ > + if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) { > + int i; > + > + for (i = 0; i < proximity_domain; i++) { > + struct kfd_topology_device *to_dev = > + kfd_topology_device_by_proximity_domain(i); > + > + if (!to_dev) > + continue; > + > + if (to_dev->gpu) > + break; > + > + to_dev->node_props.hive_id = gpu->hive_id; > + break; On a NUMA system there will be multiple CPU nodes (e.g. in NPS-4 mode). The "break" statement here means, you'll only update the hive ID on the first NUMA node. Other than that, this change makes sense. Regards, Felix > + } > + } > + > /* Check to see if this gpu device exists in the topology_device_list. > * If so, assign the gpu to that device, > * else create a Virtual CRAT for this gpu device and then parse that > @@ -1457,7 +1478,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu) > dev->node_props.max_waves_per_simd = 10; > } > > - adev = (struct amdgpu_device *)(dev->gpu->kgd); > /* kfd only concerns sram ecc on GFX and HBM ecc on UMC */ > dev->node_props.capability |= > ((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?