Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/8/2025 3:15 AM, Kim, Jonathan wrote:
> [Public]
> 
> 
> I think part of the problem is that gmc.xgmi.supported has weird usage
> and definition.
> 
> It’s partly says that it has potential to be supported by IP version,
> but doesn’t actually say anything about real support but assumed say it
> has real support in amdgpu_xgmi.c usage.
> 
> Real support is determined by gfxhub_...get_xgmi_info which has comments
> ->         /* PF_MAX_REGION=0 means xgmi is disabled */ and error
> returns on failure to read.
> 
> On top of that, the gmc.xgmi.supported field gets set by both
> amdgpu_discovery.c based on XGMI HW IP but is also set by gmc_v9_0.c
> early init base on GC version.
> 
> I think you’re locked into doing a wrapper on a wrapper because the way
> gmc.xgmi.supported is set has become spaghetti code.
> 
> If that gets clean up, and we do your suggestion of pulling xgmi related
> info into early init i.e. get info based on actual verification that
> gfxhub says xgmi is ok,  I think the series would start to make much
> more sense.
> 

There are more inputs from Hawking as well. I think that till solve xgmi
supported being set at different places. Will send a v2.

Thanks,
Lijo
>  
> 
> Jon
> 
>  
> 
> *From:*Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> *Sent:* Friday, February 7, 2025 12:35 PM
> *To:* Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx>
> *Subject:* Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
> 
>  
> 
> [Public]
> 
>  
> 
> Another try.. if it helps (you or someone else)
> 
>  
> 
> This series introduces two functions for maintenance.
> 
>  
> 
> amdgpu_xgmi_init_info - This is to initialize any XGM related static
> information. Now it's called as soon as XGMI version is discovered. Now,
> if that is causing some confusion, then I could rename to
> xgmi_early_init and call from device early init. The intent of the
> function is to initialise any static info related to XGMI.
> 
>  
> 
> amdgpu_xgmi_get_max_bandwidth - Assumes all links are uniform and
> provides the max theoretical bandwidth. Currently, the calculation is
> simple as width * speed. In future, this may change based on IP version
> like speed * width * x_factor or get the bandwidth from FW etc. Caller
> is expected to get this uniform interface for any XGMI IP version.
> 
>  
> 
> And lastly, both functions are maintained in amdgpu_xgmi.c
> 
>  
> 
> Thanks,
> 
> Lijo
> 
> ------------------------------------------------------------------------
> 
> *From:*Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>
> *Sent:* Friday, February 7, 2025 9:58:30 PM
> *To:* Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> <amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>>
> *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>
> *Subject:* RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during
> discovery
> 
>  
> 
> [Public]
> 
>  
> 
> Well … I don’t know what other feedback I can give here then.
> 
> We’re bouncing back and forth talking about language/function/logical
> structure or whatever.
> 
> I’m of the opinion that there are too many unnecessary wrappers here and
> am biased to unbroken steps that make it easier to debug/dev later on
> (at least for me).
> 
> Maybe someone else has a different opinion.
> 
>  
> 
> Jon
> 
>  
> 
> *From:*Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>
> *Sent:* Friday, February 7, 2025 11:06 AM
> *To:* Kim, Jonathan <Jonathan.Kim@xxxxxxx
> <mailto:Jonathan.Kim@xxxxxxx>>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
> *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>
> *Subject:* Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
> 
>  
> 
> [Public]
> 
>  
> 
> It so happens that driver gets part of the XGMI information through
> registers in GMC. The intent of those registers is to help GMC to
> figure out memory access when device part of XGMI hive. Driver using
> those regs doesn't mean XGMI is like a sub ip of GMC, it remains separate.
> 
>  
> 
> Thanks,
> 
> Lijo
> 
> ------------------------------------------------------------------------
> 
> *From:*Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>
> *Sent:* Friday, February 7, 2025 9:26:28 PM
> *To:* Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> <amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>>
> *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>
> *Subject:* RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during
> discovery
> 
>  
> 
> [Public]
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>
>> Sent: Friday, February 7, 2025 10:18 AM
>> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>
>> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
>>
>>
>>
>> On 2/7/2025 8:06 PM, Kim, Jonathan wrote:
>> > [Public]
>> >
>> >> -----Original Message-----
>> >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>
>> >> Sent: Friday, February 7, 2025 9:20 AM
>> >> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
>> >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>
>> >> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
>> >>
>> >>
>> >>
>> >> On 2/7/2025 7:29 PM, Kim, Jonathan wrote:
>> >>> [Public]
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>
>> >>>> Sent: Thursday, February 6, 2025 10:56 PM
>> >>>> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
>> >>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>
>> >>>> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 2/7/2025 5:03 AM, Kim, Jonathan wrote:
>> >>>>> [Public]
>> >>>>>
>> >>>>>> -----Original Message-----
>> >>>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>
>> >>>>>> Sent: Thursday, February 6, 2025 8:13 AM
>> >>>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>;
> Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>
>> >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>; Kim, Jonathan
>> >>>>>> <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>
>> >>>>>> Subject: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery
>> >>>>>>
>> >>>>>> Initialize xgmi related static information during discovery.
>> >>>>>>
>> >>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx <mailto:lijo.lazar@xxxxxxx>>
>> >>>>>> ---
>> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 20 +++++++++++++--
>> ----
>> >>>>>>  1 file changed, 14 insertions(+), 6 deletions(-)
>> >>>>>>
>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>> >>>>>> index eca431e52038..d4eade2bd4d3 100644
>> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>> >>>>>> @@ -2502,6 +2502,19 @@ static int
>> >>>> amdgpu_discovery_set_isp_ip_blocks(struct
>> >>>>>> amdgpu_device *adev)
>> >>>>>>       return 0;
>> >>>>>>  }
>> >>>>>>
>> >>>>>> +static void amdgpu_discovery_set_xgmi_info(struct amdgpu_device *adev)
>> >>>>>> +{
>> >>>>>> +     if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(4, 8, 0))
>> >>>>>> +             adev->gmc.xgmi.supported = true;
>> >>>>>> +
>> >>>>>> +     if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) ||
>> >>>>>> +         amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4))
>> >>>>>> +             adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 4, 0);
>> >>>>>
>> >>>>> Can this stuff get rolled into xgm_init_info and called directly into
>> >>>> amdgpu_discovery_set_ip_blocks?
>> >>>>> Breaking up discovery_set_xgmi_info and xgmi_init_info as 2 separate things
>> >>>> seems a little confusing.
>> >>>>>
>> >>>>
>> >>>> Intent is like this -
>> >>>>       Set IP version info. This is the job of discovery and kept inside
>> >>>> amdgpu_discovery.
>> >>>>       Set any static information derived out of IP versions and not available
>> >>>> in discovery tables. This is kept outside of discovery file.
>> >>>
>> >>>
>> >>> Then why are you proposing to set up static information in the discovery file in
>> the
>> >> first place?
>> >>
>> >> I didn't understand that statement. The function - amdgpu_xgmi_init_info
>> >> - called from discovery sets up the derived information. Only IP version
>> >> info is set inside discovery.
>> >
>> > Snip from you're last response:
>> >>>>       Set any static information derived out of IP versions and not available
>> >>>> in discovery tables. This is kept outside of discovery file.
>> > You're calling amdgpu_discovery_set_xgmi_info which calls
>> amdgpu_xgmi_init_info which is setting static derived information in the discovery
>> file.
>> > A wrapper called in a wrapper is still doing the opposite of what you're saying int
>> the snip above.
>> > If you're trying to avoid this and keep discovery clean, call xgmi_init_info in
>> amdgpu_device.c somewhere after the IP blocks are set.
>> > And put xgmi_supported definitions in xgmi_init_info since that doesn't count as IP
>> version setting.
>> >
>>
>> This is only about structural segregation - like the place where we want
>> to maintain xgmi related data. Functions setting IP versions and
>> information from discovery table is kept inside discovery. Any function
>> which adds further static data out of the IP version is kept in the IP
>> related file.
>>
>> This is not about a logical separation like xgmi related information
>> derived from an IP version shouldn't be set at discovery phase.
> 
> Yeah I get it, there's function in language structure but I could also
> argue that language structures should point to function.
> Otherwise, we could end up with a bunch of word salad.
> I wonder if it makes more sense to roll up speed and width info
> somewhere in GFXHUB initialization.
> The xGMI information is GMC based and xgmi_supported doesn't rely on IP
> versioning IIRC but rather the number of physical nodes determined by
> the memory controller.
> e.g. gfxhub_v2_1_get_xgmi_info.
> Then that would take the pressure off all this file reference jumping
> and version checking.
> 
> Jon
> 
>>
>> Thanks,
>> Lijo
>>
>> > Jon
>> >
>> >>
>> >> Thanks,
>> >> Lijo
>> >>
>> >>>
>> >>> Jon
>> >>>
>> >>>>
>> >>>> Thanks,
>> >>>> Lijo
>> >>>>
>> >>>>>> +
>> >>>>>> +     if (amdgpu_ip_version(adev, XGMI_HWIP, 0))
>> >>>>>
>> >>>>> Maybe roll this check into xgmi_init_info i.e. void early return if null.
>> >>>>>
>> >>>>>> +             amdgpu_xgmi_init_info(adev);
>> >>>>>> +}
>> >>>>>> +
>> >>>>>>  int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
>> >>>>>>  {
>> >>>>>>       int r;
>> >>>>>> @@ -2769,12 +2782,7 @@ int amdgpu_discovery_set_ip_blocks(struct
>> >>>>>> amdgpu_device *adev)
>> >>>>>>               break;
>> >>>>>>       }
>> >>>>>>
>> >>>>>> -     if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(4, 8, 0))
>> >>>>>> -             adev->gmc.xgmi.supported = true;
>> >>>>>> -
>> >>>>>> -     if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) ||
>> >>>>>> -         amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4))
>> >>>>>> -             adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 4, 0);
>> >>>>>> +     amdgpu_discovery_set_xgmi_info(adev);
>> >>>>>
>> >>>>> If you do the suggestions above, you can just call amdgpu_xgmi_init_info
>> >>>> unconditionally.
>> >>>>>
>> >>>>> Jon
>> >>>>>>
>> >>>>>>       /* set NBIO version */
>> >>>>>>       switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) {
>> >>>>>> --
>> >>>>>> 2.25.1
>> >>>>>
>> >>>
>> >
> 




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux