On 2/8/2025 3:15 AM, Kim, Jonathan wrote: > [Public] > > > I think part of the problem is that gmc.xgmi.supported has weird usage > and definition. > > It’s partly says that it has potential to be supported by IP version, > but doesn’t actually say anything about real support but assumed say it > has real support in amdgpu_xgmi.c usage. > > Real support is determined by gfxhub_...get_xgmi_info which has comments > -> /* PF_MAX_REGION=0 means xgmi is disabled */ and error > returns on failure to read. > > On top of that, the gmc.xgmi.supported field gets set by both > amdgpu_discovery.c based on XGMI HW IP but is also set by gmc_v9_0.c > early init base on GC version. > > I think you’re locked into doing a wrapper on a wrapper because the way > gmc.xgmi.supported is set has become spaghetti code. > > If that gets clean up, and we do your suggestion of pulling xgmi related > info into early init i.e. get info based on actual verification that > gfxhub says xgmi is ok, I think the series would start to make much > more sense. > There are more inputs from Hawking as well. I think that till solve xgmi supported being set at different places. Will send a v2. Thanks, Lijo > > > Jon > > > > *From:*Lazar, Lijo <Lijo.Lazar@xxxxxxx> > *Sent:* Friday, February 7, 2025 12:35 PM > *To:* Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx> > *Subject:* Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery > > > > [Public] > > > > Another try.. if it helps (you or someone else) > > > > This series introduces two functions for maintenance. > > > > amdgpu_xgmi_init_info - This is to initialize any XGM related static > information. Now it's called as soon as XGMI version is discovered. Now, > if that is causing some confusion, then I could rename to > xgmi_early_init and call from device early init. The intent of the > function is to initialise any static info related to XGMI. > > > > amdgpu_xgmi_get_max_bandwidth - Assumes all links are uniform and > provides the max theoretical bandwidth. Currently, the calculation is > simple as width * speed. In future, this may change based on IP version > like speed * width * x_factor or get the bandwidth from FW etc. Caller > is expected to get this uniform interface for any XGMI IP version. > > > > And lastly, both functions are maintained in amdgpu_xgmi.c > > > > Thanks, > > Lijo > > ------------------------------------------------------------------------ > > *From:*Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>> > *Sent:* Friday, February 7, 2025 9:58:30 PM > *To:* Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> <amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>> > *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>> > *Subject:* RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during > discovery > > > > [Public] > > > > Well … I don’t know what other feedback I can give here then. > > We’re bouncing back and forth talking about language/function/logical > structure or whatever. > > I’m of the opinion that there are too many unnecessary wrappers here and > am biased to unbroken steps that make it easier to debug/dev later on > (at least for me). > > Maybe someone else has a different opinion. > > > > Jon > > > > *From:*Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>> > *Sent:* Friday, February 7, 2025 11:06 AM > *To:* Kim, Jonathan <Jonathan.Kim@xxxxxxx > <mailto:Jonathan.Kim@xxxxxxx>>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> > *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>> > *Subject:* Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery > > > > [Public] > > > > It so happens that driver gets part of the XGMI information through > registers in GMC. The intent of those registers is to help GMC to > figure out memory access when device part of XGMI hive. Driver using > those regs doesn't mean XGMI is like a sub ip of GMC, it remains separate. > > > > Thanks, > > Lijo > > ------------------------------------------------------------------------ > > *From:*Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>> > *Sent:* Friday, February 7, 2025 9:26:28 PM > *To:* Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> <amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>> > *Cc:* Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>> > *Subject:* RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during > discovery > > > > [Public] > >> -----Original Message----- >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>> >> Sent: Friday, February 7, 2025 10:18 AM >> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>> >> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery >> >> >> >> On 2/7/2025 8:06 PM, Kim, Jonathan wrote: >> > [Public] >> > >> >> -----Original Message----- >> >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>> >> >> Sent: Friday, February 7, 2025 9:20 AM >> >> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> >> >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>> >> >> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery >> >> >> >> >> >> >> >> On 2/7/2025 7:29 PM, Kim, Jonathan wrote: >> >>> [Public] >> >>> >> >>>> -----Original Message----- >> >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>> >> >>>> Sent: Thursday, February 6, 2025 10:56 PM >> >>>> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> >> >>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>> >> >>>> Subject: Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery >> >>>> >> >>>> >> >>>> >> >>>> On 2/7/2025 5:03 AM, Kim, Jonathan wrote: >> >>>>> [Public] >> >>>>> >> >>>>>> -----Original Message----- >> >>>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>> >> >>>>>> Sent: Thursday, February 6, 2025 8:13 AM >> >>>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; > Lazar, Lijo <Lijo.Lazar@xxxxxxx <mailto:Lijo.Lazar@xxxxxxx>> >> >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx <mailto:Hawking.Zhang@xxxxxxx>>; Kim, Jonathan >> >>>>>> <Jonathan.Kim@xxxxxxx <mailto:Jonathan.Kim@xxxxxxx>> >> >>>>>> Subject: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery >> >>>>>> >> >>>>>> Initialize xgmi related static information during discovery. >> >>>>>> >> >>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx <mailto:lijo.lazar@xxxxxxx>> >> >>>>>> --- >> >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 20 +++++++++++++-- >> ---- >> >>>>>> 1 file changed, 14 insertions(+), 6 deletions(-) >> >>>>>> >> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >> >>>>>> index eca431e52038..d4eade2bd4d3 100644 >> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >> >>>>>> @@ -2502,6 +2502,19 @@ static int >> >>>> amdgpu_discovery_set_isp_ip_blocks(struct >> >>>>>> amdgpu_device *adev) >> >>>>>> return 0; >> >>>>>> } >> >>>>>> >> >>>>>> +static void amdgpu_discovery_set_xgmi_info(struct amdgpu_device *adev) >> >>>>>> +{ >> >>>>>> + if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(4, 8, 0)) >> >>>>>> + adev->gmc.xgmi.supported = true; >> >>>>>> + >> >>>>>> + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || >> >>>>>> + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) >> >>>>>> + adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 4, 0); >> >>>>> >> >>>>> Can this stuff get rolled into xgm_init_info and called directly into >> >>>> amdgpu_discovery_set_ip_blocks? >> >>>>> Breaking up discovery_set_xgmi_info and xgmi_init_info as 2 separate things >> >>>> seems a little confusing. >> >>>>> >> >>>> >> >>>> Intent is like this - >> >>>> Set IP version info. This is the job of discovery and kept inside >> >>>> amdgpu_discovery. >> >>>> Set any static information derived out of IP versions and not available >> >>>> in discovery tables. This is kept outside of discovery file. >> >>> >> >>> >> >>> Then why are you proposing to set up static information in the discovery file in >> the >> >> first place? >> >> >> >> I didn't understand that statement. The function - amdgpu_xgmi_init_info >> >> - called from discovery sets up the derived information. Only IP version >> >> info is set inside discovery. >> > >> > Snip from you're last response: >> >>>> Set any static information derived out of IP versions and not available >> >>>> in discovery tables. This is kept outside of discovery file. >> > You're calling amdgpu_discovery_set_xgmi_info which calls >> amdgpu_xgmi_init_info which is setting static derived information in the discovery >> file. >> > A wrapper called in a wrapper is still doing the opposite of what you're saying int >> the snip above. >> > If you're trying to avoid this and keep discovery clean, call xgmi_init_info in >> amdgpu_device.c somewhere after the IP blocks are set. >> > And put xgmi_supported definitions in xgmi_init_info since that doesn't count as IP >> version setting. >> > >> >> This is only about structural segregation - like the place where we want >> to maintain xgmi related data. Functions setting IP versions and >> information from discovery table is kept inside discovery. Any function >> which adds further static data out of the IP version is kept in the IP >> related file. >> >> This is not about a logical separation like xgmi related information >> derived from an IP version shouldn't be set at discovery phase. > > Yeah I get it, there's function in language structure but I could also > argue that language structures should point to function. > Otherwise, we could end up with a bunch of word salad. > I wonder if it makes more sense to roll up speed and width info > somewhere in GFXHUB initialization. > The xGMI information is GMC based and xgmi_supported doesn't rely on IP > versioning IIRC but rather the number of physical nodes determined by > the memory controller. > e.g. gfxhub_v2_1_get_xgmi_info. > Then that would take the pressure off all this file reference jumping > and version checking. > > Jon > >> >> Thanks, >> Lijo >> >> > Jon >> > >> >> >> >> Thanks, >> >> Lijo >> >> >> >>> >> >>> Jon >> >>> >> >>>> >> >>>> Thanks, >> >>>> Lijo >> >>>> >> >>>>>> + >> >>>>>> + if (amdgpu_ip_version(adev, XGMI_HWIP, 0)) >> >>>>> >> >>>>> Maybe roll this check into xgmi_init_info i.e. void early return if null. >> >>>>> >> >>>>>> + amdgpu_xgmi_init_info(adev); >> >>>>>> +} >> >>>>>> + >> >>>>>> int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev) >> >>>>>> { >> >>>>>> int r; >> >>>>>> @@ -2769,12 +2782,7 @@ int amdgpu_discovery_set_ip_blocks(struct >> >>>>>> amdgpu_device *adev) >> >>>>>> break; >> >>>>>> } >> >>>>>> >> >>>>>> - if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(4, 8, 0)) >> >>>>>> - adev->gmc.xgmi.supported = true; >> >>>>>> - >> >>>>>> - if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || >> >>>>>> - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) >> >>>>>> - adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 4, 0); >> >>>>>> + amdgpu_discovery_set_xgmi_info(adev); >> >>>>> >> >>>>> If you do the suggestions above, you can just call amdgpu_xgmi_init_info >> >>>> unconditionally. >> >>>>> >> >>>>> Jon >> >>>>>> >> >>>>>> /* set NBIO version */ >> >>>>>> switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) { >> >>>>>> -- >> >>>>>> 2.25.1 >> >>>>> >> >>> >> > >