Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]


Another try.. if it helps (you or someone else)

This series introduces two functions for maintenance.

amdgpu_xgmi_init_info - This is to initialize any XGM related static information. Now it's called as soon as XGMI version is discovered. Now, if that is causing some confusion, then I could rename to xgmi_early_init and call from device early init. The intent of the function is to initialise any static info related to XGMI.

amdgpu_xgmi_get_max_bandwidth - Assumes all links are uniform and provides the max theoretical bandwidth. Currently, the calculation is simple as width * speed. In future, this may change based on IP version like speed * width * x_factor or get the bandwidth from FW etc. Caller is expected to get this uniform interface for any XGMI IP version.

And lastly, both functions are maintained in amdgpu_xgmi.c

Thanks,
Lijo

From: Kim, Jonathan <Jonathan.Kim@xxxxxxx>
Sent: Friday, February 7, 2025 9:58:30 PM
To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
 

[Public]


Well … I don’t know what other feedback I can give here then.

We’re bouncing back and forth talking about language/function/logical structure or whatever.

I’m of the opinion that there are too many unnecessary wrappers here and am biased to unbroken steps that make it easier to debug/dev later on (at least for me).

Maybe someone else has a different opinion.

 

Jon

 

From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Friday, February 7, 2025 11:06 AM
To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

 

[Public]

 

It so happens that driver gets part of the XGMI information through registers in GMC. The intent of those registers is to help GMC to figure out memory access when device part of XGMI hive. Driver using those regs doesn't mean XGMI is like a sub ip of GMC, it remains separate.

 

Thanks,

Lijo


From: Kim, Jonathan <Jonathan.Kim@xxxxxxx>
Sent: Friday, February 7, 2025 9:26:28 PM
To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

 

[Public]

> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> Sent: Friday, February 7, 2025 10:18 AM
> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
>
>
>
> On 2/7/2025 8:06 PM, Kim, Jonathan wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> >> Sent: Friday, February 7, 2025 9:20 AM
> >> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
> >> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
> >>
> >>
> >>
> >> On 2/7/2025 7:29 PM, Kim, Jonathan wrote:
> >>> [Public]
> >>>
> >>>> -----Original Message-----
> >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> >>>> Sent: Thursday, February 6, 2025 10:56 PM
> >>>> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> >>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
> >>>> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
> >>>>
> >>>>
> >>>>
> >>>> On 2/7/2025 5:03 AM, Kim, Jonathan wrote:
> >>>>> [Public]
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> >>>>>> Sent: Thursday, February 6, 2025 8:13 AM
> >>>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Kim, Jonathan
> >>>>>> <Jonathan.Kim@xxxxxxx>
> >>>>>> Subject: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
> >>>>>>
> >>>>>> Initialize xgmi related static information during discovery.
> >>>>>>
> >>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 20 +++++++++++++--
> ----
> >>>>>>  1 file changed, 14 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> >>>>>> index eca431e52038..d4eade2bd4d3 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> >>>>>> @@ -2502,6 +2502,19 @@ static int
> >>>> amdgpu_discovery_set_isp_ip_blocks(struct
> >>>>>> amdgpu_device *adev)
> >>>>>>       return 0;
> >>>>>>  }
> >>>>>>
> >>>>>> +static void amdgpu_discovery_set_xgmi_info(struct amdgpu_device *adev)
> >>>>>> +{
> >>>>>> +     if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(4, 8, 0))
> >>>>>> +             adev->gmc.xgmi.supported = true;
> >>>>>> +
> >>>>>> +     if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) ||
> >>>>>> +         amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4))
> >>>>>> +             adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 4, 0);
> >>>>>
> >>>>> Can this stuff get rolled into xgm_init_info and called directly into
> >>>> amdgpu_discovery_set_ip_blocks?
> >>>>> Breaking up discovery_set_xgmi_info and xgmi_init_info as 2 separate things
> >>>> seems a little confusing.
> >>>>>
> >>>>
> >>>> Intent is like this -
> >>>>       Set IP version info. This is the job of discovery and kept inside
> >>>> amdgpu_discovery.
> >>>>       Set any static information derived out of IP versions and not available
> >>>> in discovery tables. This is kept outside of discovery file.
> >>>
> >>>
> >>> Then why are you proposing to set up static information in the discovery file in
> the
> >> first place?
> >>
> >> I didn't understand that statement. The function - amdgpu_xgmi_init_info
> >> - called from discovery sets up the derived information. Only IP version
> >> info is set inside discovery.
> >
> > Snip from you're last response:
> >>>>       Set any static information derived out of IP versions and not available
> >>>> in discovery tables. This is kept outside of discovery file.
> > You're calling amdgpu_discovery_set_xgmi_info which calls
> amdgpu_xgmi_init_info which is setting static derived information in the discovery
> file.
> > A wrapper called in a wrapper is still doing the opposite of what you're saying int
> the snip above.
> > If you're trying to avoid this and keep discovery clean, call xgmi_init_info in
> amdgpu_device.c somewhere after the IP blocks are set.
> > And put xgmi_supported definitions in xgmi_init_info since that doesn't count as IP
> version setting.
> >
>
> This is only about structural segregation - like the place where we want
> to maintain xgmi related data. Functions setting IP versions and
> information from discovery table is kept inside discovery. Any function
> which adds further static data out of the IP version is kept in the IP
> related file.
>
> This is not about a logical separation like xgmi related information
> derived from an IP version shouldn't be set at discovery phase.

Yeah I get it, there's function in language structure but I could also argue that language structures should point to function.
Otherwise, we could end up with a bunch of word salad.
I wonder if it makes more sense to roll up speed and width info somewhere in GFXHUB initialization.
The xGMI information is GMC based and xgmi_supported doesn't rely on IP versioning IIRC but rather the number of physical nodes determined by the memory controller.
e.g. gfxhub_v2_1_get_xgmi_info.
Then that would take the pressure off all this file reference jumping and version checking.

Jon

>
> Thanks,
> Lijo
>
> > Jon
> >
> >>
> >> Thanks,
> >> Lijo
> >>
> >>>
> >>> Jon
> >>>
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>>>> +
> >>>>>> +     if (amdgpu_ip_version(adev, XGMI_HWIP, 0))
> >>>>>
> >>>>> Maybe roll this check into xgmi_init_info i.e. void early return if null.
> >>>>>
> >>>>>> +             amdgpu_xgmi_init_info(adev);
> >>>>>> +}
> >>>>>> +
> >>>>>>  int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
> >>>>>>  {
> >>>>>>       int r;
> >>>>>> @@ -2769,12 +2782,7 @@ int amdgpu_discovery_set_ip_blocks(struct
> >>>>>> amdgpu_device *adev)
> >>>>>>               break;
> >>>>>>       }
> >>>>>>
> >>>>>> -     if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(4, 8, 0))
> >>>>>> -             adev->gmc.xgmi.supported = true;
> >>>>>> -
> >>>>>> -     if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) ||
> >>>>>> -         amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4))
> >>>>>> -             adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 4, 0);
> >>>>>> +     amdgpu_discovery_set_xgmi_info(adev);
> >>>>>
> >>>>> If you do the suggestions above, you can just call amdgpu_xgmi_init_info
> >>>> unconditionally.
> >>>>>
> >>>>> Jon
> >>>>>>
> >>>>>>       /* set NBIO version */
> >>>>>>       switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) {
> >>>>>> --
> >>>>>> 2.25.1
> >>>>>
> >>>
> >


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux