[Public] > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Wednesday, July 19, 2023 12:53 PM > To: Chen, Guchun <Guchun.Chen@xxxxxxx>; Zhu, James > <James.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kamal, Asad > <Asad.Kamal@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu: Update ring scheduler info as needed > > > > On 7/19/2023 10:12 AM, Chen, Guchun wrote: > > [Public] > > > >> -----Original Message----- > >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > >> Lazar, Lijo > >> Sent: Wednesday, July 19, 2023 12:13 AM > >> To: Zhu, James <James.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhu, James > >> <James.Zhu@xxxxxxx>; Kamal, Asad <Asad.Kamal@xxxxxxx>; Zhang, > Hawking > >> <Hawking.Zhang@xxxxxxx> > >> Subject: Re: [PATCH] drm/amdgpu: Update ring scheduler info as needed > >> > >> > >> > >> On 7/18/2023 9:39 PM, James Zhu wrote: > >>> > >>> On 2023-07-18 11:54, Lazar, Lijo wrote: > >>>> > >>>> > >>>> On 7/18/2023 8:39 PM, James Zhu wrote: > >>>>> > >>>>> On 2023-07-18 10:19, Lazar, Lijo wrote: > >>>>>> > >>>>>> > >>>>>> On 7/18/2023 7:30 PM, James Zhu wrote: > >>>>>>> > >>>>>>> On 2023-07-18 08:21, Lijo Lazar wrote: > >>>>>>>> Not all rings have scheduler associated. Only update scheduler > >>>>>>>> data for rings with scheduler. It could result in out of bound > >>>>>>>> access as total rings are more than those associated with > >>>>>>>> particular IPs. > >>>>>>>> > >>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> > >>>>>>>> --- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 2 +- > >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>>> > >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c > >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c > >>>>>>>> index 72b629a78c62..d0fc62784e82 100644 > >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c > >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c > >>>>>>>> @@ -134,7 +134,7 @@ static int > >>>>>>>> aqua_vanjaram_xcp_sched_list_update( > >>>>>>>> for (i = 0; i < AMDGPU_MAX_RINGS; i++) { > >>>>>>>> ring = adev->rings[i]; > >>>>>>>> - if (!ring || !ring->sched.ready) > >>>>>>>> + if (!ring || !ring->sched.ready || ring->no_scheduler) > >>>>>>> [JZ] any case for ring->no_scheduler = true, but > >>>>>>> ring->sched.ready = true? > >>>>>> > >>>>>> amdgpu_gfx_kiq_init_ring sets no_scheduler flag true for kiq rings. > >>>>>> amdgpu_device_init_schedulers inits schedulers only for rings > >>>>>> which doesn't have no_scheduler. However in > >>>>>> gfx_v9_4_3_xcc_kiq_resume, the ring is marked with > >>>>>> ring->sched.ready = true; > >>>>>> > >>>>>> I'm not sure why it's done this way, but this is the case even > >>>>>> for gfxv9. > > > > Driver so far still has some concept abuse on sched.ready. In > amdgpu_device_init_schedulers , it's set to be true once setting up sw > scheduler for the ring requirement, however, after driver is up, this flag > works like a hint to tell the corresponding ring is ready for HW submission > after ring test succeeds. > > > > For this case, it's probably caused by amdgpu_gfx_enable_kcq calling > amdgpu_ring_test_helper, which sets sched.ready unconditionally based on > ring test result. This will lead value mismatch between ring->no_scheduler > and ring->sched.ready. If my understanding is correct, I think adding a check > in this helper function which only sets sched.ready when no_scheduler is > false will help. So I will provide a patch soon to prevent this mismatch in > future. > > > > +if (!ring->no_scheduler) > > + ring->sched.ready != r; > > The ring.ready(ring->sched.ready) flag is used in gmcv9 code as well.This will > need more rework. Exactly, let me double check. Regards, Guchun > Thanks, > Lijo > > > > Regards, > > Guchun > > > >>>>> [JZ] I think the sched.ready confuses people. here just means ring > >>>>> is ready be used. > >>>> > >>>> Thanks for the clarification. One question is then do we need to > >>>> check ring ready status for assigning xcp ids or just check if the > >>>> ring is associated with a scheduler? > >>>> > >>>> Keep only this - if (!ring || ring->no_scheduler) to assign xcp ids > >>>> and leave the ring ready status to the logic which really accepts > >>>> jobs on the ring? > >>> [JZ] I think keeping ring->sched.ready will reduce schedule list > >>> which will save a little scheduling time. > >> > >> Fine, will just push this one. > >> > >> Thanks, > >> Lijo > >> > >>>> > >>>> Thanks, > >>>> Lijo > >>>> > >>>>>> This patch is Reviewed-by: James Zhu <James.Zhu@xxxxxxx> > >>>>> > >>>>>> Thanks, > >>>>>> Lijo > >>>>>> > >>>>>>>> continue; > >>>>>>>> aqua_vanjaram_xcp_gpu_sched_update(adev, ring, > >>>>>>>> ring->xcp_id);