On 10/22/2024 12:55 AM, Felix Kuehling wrote: > > > On 2024-10-21 10:07, Lazar, Lijo wrote: >> >> >> On 10/19/2024 12:46 AM, Felix Kuehling wrote: >>> >>> On 2024-10-14 05:19, Lijo Lazar wrote: >>>> In certain cases - ex: when a reset is required on initialization - XCP >>>> manager won't have a valid partition mode. In such cases, use SPX as the >>>> default selected mode for which partition configuration details are >>>> populated. >>>> >>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> >>>> Reported-by: Hao Zhou <hao.zhou@xxxxxxx> >>>> >>>> Fixes: c7de57033d9b ("drm/amdgpu: Add sysfs nodes to get xcp details") >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 10 +++++++--- >>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c >>>> index 111bf897e72e..83a16918ea76 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c >>>> @@ -606,7 +606,7 @@ void amdgpu_xcp_cfg_sysfs_init(struct >>>> amdgpu_device *adev) >>>> { >>>> struct amdgpu_xcp_res_details *xcp_res; >>>> struct amdgpu_xcp_cfg *xcp_cfg; >>>> - int i, r, j, rid; >>>> + int i, r, j, rid, mode; >>>> if (!adev->xcp_mgr) >>>> return; >>>> @@ -625,11 +625,15 @@ void amdgpu_xcp_cfg_sysfs_init(struct >>>> amdgpu_device *adev) >>>> if (r) >>>> goto err1; >>>> - r = amdgpu_xcp_get_res_info(xcp_cfg->xcp_mgr, >>>> xcp_cfg->xcp_mgr->mode, xcp_cfg); >>>> + mode = (xcp_cfg->xcp_mgr->mode == >>>> + AMDGPU_UNKNOWN_COMPUTE_PARTITION_MODE) ? >>>> + AMDGPU_SPX_PARTITION_MODE : >>>> + xcp_cfg->xcp_mgr->mode; >>> >>> Shouldn't this depend on the memory partition mode as well? You must >>> have at least as many compute partitions as memory partitions because >>> each compute partition can only use a single memory partition. >> >> This is not dependent on the current/active compute partition mode. It >> is to show the configuration (number of xccs, vcns, shared etc.) >> supported for a partition mode. SPX is the default partition mode at >> boot up. That is used as the default mode. >> >> It's not a strict one-to-one, a compute partition may use other memory >> partitions also non-coherently. > > I agree it's not 1:1. Multiple compute partitions can share a single memory partition. But there is no way for a compute partition to use multiple memory partitions, at least with compute APIs. Each partition exposes only one VRAM heap. Therefore I think the memory partition mode should influence the default compute partition mode. E.g. SPX can only work in NPS1 mode. > There are cases like one node/partition can transfer data to/fro other nodes' memory (similar to multi GPU cases). I am not sure if you consider that as a 'usage'. Anyway, in this case, memory partition is not considered. xcp_config is an option which allows to select the partition mode for which the configuration is shown (this is independent of the current memory partition mode). This patch makes it such that the default selected one on driver load is SPX which is the default system boot mode. The way to achieve SPX may include a memory partition switch based on the current memory partition mode, but that is not the consideration of this patch. Thanks, Lijo > Regards, > Felix > >> >> Thanks, >> Lijo >> >>> >>> Regards, >>> Felix >>> >>> >>>> + r = amdgpu_xcp_get_res_info(xcp_cfg->xcp_mgr, mode, xcp_cfg); >>>> if (r) >>>> goto err1; >>>> - xcp_cfg->mode = xcp_cfg->xcp_mgr->mode; >>>> + xcp_cfg->mode = mode; >>>> for (i = 0; i < xcp_cfg->num_res; i++) { >>>> xcp_res = &xcp_cfg->xcp_res[i]; >>>> rid = xcp_res->id;