Re: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/7/2025 7:50 PM, Kim, Jonathan wrote:
> [Public]
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
>> Sent: Thursday, February 6, 2025 10:51 PM
>> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
>> Subject: Re: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info
>>
>>
>>
>> On 2/7/2025 4:56 AM, Kim, Jonathan wrote:
>>> [Public]
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
>>>> Sent: Thursday, February 6, 2025 8:13 AM
>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Lazar, Lijo <Lijo.Lazar@xxxxxxx>
>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Kim, Jonathan
>>>> <Jonathan.Kim@xxxxxxx>
>>>> Subject: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info
>>>>
>>>> Add APIs to initialize XGMI speed, width details and get to max
>>>> bandwidth supported. It is assumed that a device only supports same
>>>> generation of XGMI links with uniform width.
>>>>
>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 41
>> ++++++++++++++++++++++++
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 11 +++++++
>>>>  2 files changed, 52 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> index 03d909c7b14b..edef5763e2b8 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> @@ -1679,3 +1679,44 @@ bool amdgpu_xgmi_same_hive(struct
>> amdgpu_device
>>>> *adev,
>>>>               adev->gmc.xgmi.hive_id &&
>>>>               adev->gmc.xgmi.hive_id == bo_adev->gmc.xgmi.hive_id);
>>>>  }
>>>> +
>>>> +static inline uint32_t
>>>> +__amdgpu_xgmi_get_max_bandwidth(uint8_t link_width,
>>>> +                             enum amdgpu_xgmi_link_speed max_speed)
>>>> +{
>>>> +     /* Returns unidirectional bandwidth in Mbps */
>>>> +     return max_speed * 1000 * link_width;
>>>
>>> Instead of wrapping this in a new static, would it work to declare a new member
>> adev->gmc.xgmi.max_bandwidth?
>>> Then in amdgpu_xgmi_init_info, you can define max_bandwidth directly after the
>> switch.
>>>
>>
>> Link Width/Link speed is generally the standard way to represent link
>> info. Presently, tools like SMI show these for PCIe. Followed the same
>> here and kept a max bandwidth value calculation.
> 
> GMC xGMI info is for internal driver usage as well.
> I don’t see the point of doing extra calculations or having unnecessary wrappers for a static piece of info.
> If you don't want bandwidth in the GMC xGMI info struct, then for your 4th patch (drm/amdgpu: Use xgmi APIs to get bandwidth), just reference speed and link in the caller and do the calculation directly there.
> Going down a wrapper rabbit hole seems like it will make things harder to maintain in the future.
> 
This is again a seggregation problem. It is about keeping xgmi related
information including xgmi bandwidth calculation method in amdgpu_xgmi
files. This is also the reason for moving amdgpu_xgmi struct definitions
from gmc file to xgmi header in the first patch.

In future, this could be called at more places, or xgmi inturn could
obtain through a FW provided method. Keeping at xgmi file may be easier
so that clients always call the same interface.

Thanks,
Lijo

> Jon
> 
>>
>>>> +}
>>>> +
>>>> +/**
>>>> + * amdgpu_xgmi_get_max_bandwidth - Get max possible bandwidth of a single
>>>> XGMI
>>>> + * link of the device in Mbps
>>>> + * @adev: Target device.
>>>> + *
>>>> + * Gets the max possible bandwidth of a single XGMI link of the device in Mbps.
>>>> + * Assumption is all links of the device have the same width and are of the
>>>> + * same XGMI generation.
>>>> + */
>>>> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev)
>>>> +{
>>>> +     if (!adev->gmc.xgmi.supported)
>>>> +             return 0;
>>>> +
>>>> +     return __amdgpu_xgmi_get_max_bandwidth(adev->gmc.xgmi.max_width,
>>>> adev->gmc.xgmi.max_speed);
>>>
>>> If you do the suggested above, both tests can be simplified into 1 line as
>>> -> return = xgmi_supported ? max_bandwidth : 0;
>>>
>>
>> Yes, primarily width/speed is chosen to represent link info.
>>
>>>> +}
>>>> +
>>>> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev)
>>>> +{
>>>> +     switch (amdgpu_ip_version(adev, XGMI_HWIP, 0)) {
>>>> +     case IP_VERSION(6, 1, 0):
>>>> +             adev->gmc.xgmi.max_speed = XGMI2_5_SPEED_GT;
>>>> +             adev->gmc.xgmi.max_width = 16;
>>>> +             break;
>>>> +     case IP_VERSION(6, 4, 0):
>>>> +             adev->gmc.xgmi.max_speed = XGMI3_SPEED_GT;
>>>> +             adev->gmc.xgmi.max_width = 16;
>>>> +             break;
>>>> +     default:
>>>> +             break;
>>>> +     }
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> index 044c4f6be44a..384c4ddef78a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> @@ -25,6 +25,12 @@
>>>>  #include <drm/task_barrier.h>
>>>>  #include "amdgpu_ras.h"
>>>>
>>>> +enum amdgpu_xgmi_link_speed {
>>>> +     XGMI2_SPEED_GT = 16,
>>>> +     XGMI2_5_SPEED_GT = 25,
>>>> +     XGMI3_SPEED_GT = 32
>>>
>>> Why not declare and define them as units of Mbps to begin with so you don't have
>> to do the calculation later?
>>
>> Actually, this is also something which is followed from PCIe to
>> represent as GT/s. If bandwidth field alone is kept, then this also
>> won't be required.
>>
>> I'm not so sure if that is a good way to keep things.
>>
>> Thanks,
>> Lijo
>>
>>>
>>> Jon
>>>
>>>> +};
>>>> +
>>>>  struct amdgpu_hive_info {
>>>>       struct kobject kobj;
>>>>       uint64_t hive_id;
>>>> @@ -75,6 +81,8 @@ struct amdgpu_xgmi {
>>>>       struct ras_common_if *ras_if;
>>>>       bool connected_to_cpu;
>>>>       struct amdgpu_xgmi_ras *ras;
>>>> +     enum amdgpu_xgmi_link_speed max_speed;
>>>> +     uint8_t max_width;
>>>>  };
>>>>
>>>>  struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
>>>> @@ -102,4 +110,7 @@ int amdgpu_xgmi_request_nps_change(struct
>>>> amdgpu_device *adev,
>>>>  int amdgpu_get_xgmi_link_status(struct amdgpu_device *adev,
>>>>                               int global_link_num);
>>>>
>>>> +void amdgpu_xgmi_init_info(struct amdgpu_device *adev);
>>>> +uint32_t amdgpu_xgmi_get_max_bandwidth(struct amdgpu_device *adev);
>>>> +
>>>>  #endif
>>>> --
>>>> 2.25.1
>>>
> 




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux