[Public] > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Thursday, February 6, 2025 10:51 PM > To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx> > Subject: Re: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info > > > > On 2/7/2025 4:56 AM, Kim, Jonathan wrote: > > [Public] > > > >> -----Original Message----- > >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >> Sent: Thursday, February 6, 2025 8:13 AM > >> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Kim, Jonathan > >> <Jonathan.Kim@xxxxxxx> > >> Subject: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info > >> > >> Add APIs to initialize XGMI speed, width details and get to max > >> bandwidth supported. It is assumed that a device only supports same > >> generation of XGMI links with uniform width. > >> > >> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 41 > ++++++++++++++++++++++++ > >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 11 +++++++ > >> 2 files changed, 52 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> index 03d909c7b14b..edef5763e2b8 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> @@ -1679,3 +1679,44 @@ bool amdgpu_xgmi_same_hive(struct > amdgpu_device > >> *adev, > >> adev->gmc.xgmi.hive_id && > >> adev->gmc.xgmi.hive_id == bo_adev->gmc.xgmi.hive_id); > >> } > >> + > >> +static inline uint32_t > >> +__amdgpu_xgmi_get_max_bandwidth(uint8_t link_width, > >> + enum amdgpu_xgmi_link_speed max_speed) > >> +{ > >> + /* Returns unidirectional bandwidth in Mbps */ > >> + return max_speed * 1000 * link_width; > > > > Instead of wrapping this in a new static, would it work to declare a new member > adev->gmc.xgmi.max_bandwidth? > > Then in amdgpu_xgmi_init_info, you can define max_bandwidth directly after the > switch. > > > > Link Width/Link speed is generally the standard way to represent link > info. Presently, tools like SMI show these for PCIe. Followed the same > here and kept a max bandwidth value calculation. GMC xGMI info is for internal driver usage as well. I don’t see the point of doing extra calculations or having unnecessary wrappers for a static piece of info. If you don't want bandwidth in the GMC xGMI info struct, then for your 4th patch (drm/amdgpu: Use xgmi APIs to get bandwidth), just reference speed and link in the caller and do the calculation directly there. Going down a wrapper rabbit hole seems like it will make things harder to maintain in the future. Jon > > >> +} > >> + > >> +/** > >> + * amdgpu_xgmi_get_max_bandwidth - Get max possible bandwidth of a single > >> XGMI > >> + * link of the device in Mbps > >> + * @adev: Target device. > >> + * > >> + * Gets the max possible bandwidth of a single XGMI link of the device in Mbps. > >> + * Assumption is all links of the device have the same width and are of the > >> + * same XGMI generation. > >> + */ > >> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev) > >> +{ > >> + if (!adev->gmc.xgmi.supported) > >> + return 0; > >> + > >> + return __amdgpu_xgmi_get_max_bandwidth(adev->gmc.xgmi.max_width, > >> adev->gmc.xgmi.max_speed); > > > > If you do the suggested above, both tests can be simplified into 1 line as > > -> return = xgmi_supported ? max_bandwidth : 0; > > > > Yes, primarily width/speed is chosen to represent link info. > > >> +} > >> + > >> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev) > >> +{ > >> + switch (amdgpu_ip_version(adev, XGMI_HWIP, 0)) { > >> + case IP_VERSION(6, 1, 0): > >> + adev->gmc.xgmi.max_speed = XGMI2_5_SPEED_GT; > >> + adev->gmc.xgmi.max_width = 16; > >> + break; > >> + case IP_VERSION(6, 4, 0): > >> + adev->gmc.xgmi.max_speed = XGMI3_SPEED_GT; > >> + adev->gmc.xgmi.max_width = 16; > >> + break; > >> + default: > >> + break; > >> + } > >> +} > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> index 044c4f6be44a..384c4ddef78a 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> @@ -25,6 +25,12 @@ > >> #include <drm/task_barrier.h> > >> #include "amdgpu_ras.h" > >> > >> +enum amdgpu_xgmi_link_speed { > >> + XGMI2_SPEED_GT = 16, > >> + XGMI2_5_SPEED_GT = 25, > >> + XGMI3_SPEED_GT = 32 > > > > Why not declare and define them as units of Mbps to begin with so you don't have > to do the calculation later? > > Actually, this is also something which is followed from PCIe to > represent as GT/s. If bandwidth field alone is kept, then this also > won't be required. > > I'm not so sure if that is a good way to keep things. > > Thanks, > Lijo > > > > > Jon > > > >> +}; > >> + > >> struct amdgpu_hive_info { > >> struct kobject kobj; > >> uint64_t hive_id; > >> @@ -75,6 +81,8 @@ struct amdgpu_xgmi { > >> struct ras_common_if *ras_if; > >> bool connected_to_cpu; > >> struct amdgpu_xgmi_ras *ras; > >> + enum amdgpu_xgmi_link_speed max_speed; > >> + uint8_t max_width; > >> }; > >> > >> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); > >> @@ -102,4 +110,7 @@ int amdgpu_xgmi_request_nps_change(struct > >> amdgpu_device *adev, > >> int amdgpu_get_xgmi_link_status(struct amdgpu_device *adev, > >> int global_link_num); > >> > >> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev); > >> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev); > >> + > >> #endif > >> -- > >> 2.25.1 > >