On 2/7/2025 7:50 PM, Kim, Jonathan wrote: > [Public] > >> -----Original Message----- >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> >> Sent: Thursday, February 6, 2025 10:51 PM >> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx> >> Subject: Re: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info >> >> >> >> On 2/7/2025 4:56 AM, Kim, Jonathan wrote: >>> [Public] >>> >>>> -----Original Message----- >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> >>>> Sent: Thursday, February 6, 2025 8:13 AM >>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Lazar, Lijo <Lijo.Lazar@xxxxxxx> >>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Kim, Jonathan >>>> <Jonathan.Kim@xxxxxxx> >>>> Subject: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info >>>> >>>> Add APIs to initialize XGMI speed, width details and get to max >>>> bandwidth supported. It is assumed that a device only supports same >>>> generation of XGMI links with uniform width. >>>> >>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 41 >> ++++++++++++++++++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 11 +++++++ >>>> 2 files changed, 52 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>>> index 03d909c7b14b..edef5763e2b8 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>>> @@ -1679,3 +1679,44 @@ bool amdgpu_xgmi_same_hive(struct >> amdgpu_device >>>> *adev, >>>> adev->gmc.xgmi.hive_id && >>>> adev->gmc.xgmi.hive_id == bo_adev->gmc.xgmi.hive_id); >>>> } >>>> + >>>> +static inline uint32_t >>>> +__amdgpu_xgmi_get_max_bandwidth(uint8_t link_width, >>>> + enum amdgpu_xgmi_link_speed max_speed) >>>> +{ >>>> + /* Returns unidirectional bandwidth in Mbps */ >>>> + return max_speed * 1000 * link_width; >>> >>> Instead of wrapping this in a new static, would it work to declare a new member >> adev->gmc.xgmi.max_bandwidth? >>> Then in amdgpu_xgmi_init_info, you can define max_bandwidth directly after the >> switch. >>> >> >> Link Width/Link speed is generally the standard way to represent link >> info. Presently, tools like SMI show these for PCIe. Followed the same >> here and kept a max bandwidth value calculation. > > GMC xGMI info is for internal driver usage as well. > I don’t see the point of doing extra calculations or having unnecessary wrappers for a static piece of info. > If you don't want bandwidth in the GMC xGMI info struct, then for your 4th patch (drm/amdgpu: Use xgmi APIs to get bandwidth), just reference speed and link in the caller and do the calculation directly there. > Going down a wrapper rabbit hole seems like it will make things harder to maintain in the future. > This is again a seggregation problem. It is about keeping xgmi related information including xgmi bandwidth calculation method in amdgpu_xgmi files. This is also the reason for moving amdgpu_xgmi struct definitions from gmc file to xgmi header in the first patch. In future, this could be called at more places, or xgmi inturn could obtain through a FW provided method. Keeping at xgmi file may be easier so that clients always call the same interface. Thanks, Lijo > Jon > >> >>>> +} >>>> + >>>> +/** >>>> + * amdgpu_xgmi_get_max_bandwidth - Get max possible bandwidth of a single >>>> XGMI >>>> + * link of the device in Mbps >>>> + * @adev: Target device. >>>> + * >>>> + * Gets the max possible bandwidth of a single XGMI link of the device in Mbps. >>>> + * Assumption is all links of the device have the same width and are of the >>>> + * same XGMI generation. >>>> + */ >>>> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev) >>>> +{ >>>> + if (!adev->gmc.xgmi.supported) >>>> + return 0; >>>> + >>>> + return __amdgpu_xgmi_get_max_bandwidth(adev->gmc.xgmi.max_width, >>>> adev->gmc.xgmi.max_speed); >>> >>> If you do the suggested above, both tests can be simplified into 1 line as >>> -> return = xgmi_supported ? max_bandwidth : 0; >>> >> >> Yes, primarily width/speed is chosen to represent link info. >> >>>> +} >>>> + >>>> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev) >>>> +{ >>>> + switch (amdgpu_ip_version(adev, XGMI_HWIP, 0)) { >>>> + case IP_VERSION(6, 1, 0): >>>> + adev->gmc.xgmi.max_speed = XGMI2_5_SPEED_GT; >>>> + adev->gmc.xgmi.max_width = 16; >>>> + break; >>>> + case IP_VERSION(6, 4, 0): >>>> + adev->gmc.xgmi.max_speed = XGMI3_SPEED_GT; >>>> + adev->gmc.xgmi.max_width = 16; >>>> + break; >>>> + default: >>>> + break; >>>> + } >>>> +} >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>>> index 044c4f6be44a..384c4ddef78a 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>>> @@ -25,6 +25,12 @@ >>>> #include <drm/task_barrier.h> >>>> #include "amdgpu_ras.h" >>>> >>>> +enum amdgpu_xgmi_link_speed { >>>> + XGMI2_SPEED_GT = 16, >>>> + XGMI2_5_SPEED_GT = 25, >>>> + XGMI3_SPEED_GT = 32 >>> >>> Why not declare and define them as units of Mbps to begin with so you don't have >> to do the calculation later? >> >> Actually, this is also something which is followed from PCIe to >> represent as GT/s. If bandwidth field alone is kept, then this also >> won't be required. >> >> I'm not so sure if that is a good way to keep things. >> >> Thanks, >> Lijo >> >>> >>> Jon >>> >>>> +}; >>>> + >>>> struct amdgpu_hive_info { >>>> struct kobject kobj; >>>> uint64_t hive_id; >>>> @@ -75,6 +81,8 @@ struct amdgpu_xgmi { >>>> struct ras_common_if *ras_if; >>>> bool connected_to_cpu; >>>> struct amdgpu_xgmi_ras *ras; >>>> + enum amdgpu_xgmi_link_speed max_speed; >>>> + uint8_t max_width; >>>> }; >>>> >>>> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); >>>> @@ -102,4 +110,7 @@ int amdgpu_xgmi_request_nps_change(struct >>>> amdgpu_device *adev, >>>> int amdgpu_get_xgmi_link_status(struct amdgpu_device *adev, >>>> int global_link_num); >>>> >>>> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev); >>>> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev); >>>> + >>>> #endif >>>> -- >>>> 2.25.1 >>> >