[AMD Official Use Only] > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Wednesday, August 18, 2021 5:35 PM > To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Liu, > Leo <Leo.Liu@xxxxxxx> > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun > <Guchun.Chen@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12 UVD/VCE > on suspend > > > > On 8/18/2021 2:15 PM, Quan, Evan wrote: > > [AMD Official Use Only] > > > > > > > >> -----Original Message----- > >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >> Sent: Tuesday, August 17, 2021 6:09 PM > >> To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; > >> Liu, Leo <Leo.Liu@xxxxxxx> > >> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun > >> <Guchun.Chen@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx> > >> Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12 > UVD/VCE > >> on suspend > >> > >> > >> > >> On 8/17/2021 3:13 PM, Quan, Evan wrote: > >>> [AMD Official Use Only] > >>> > >>> +Leo to share his insights > >>> > >>>> -----Original Message----- > >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >>>> Sent: Tuesday, August 17, 2021 3:28 PM > >>>> To: Quan, Evan <Evan.Quan@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx > >>>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, > Guchun > >>>> <Guchun.Chen@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx> > >>>> Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12 > >> UVD/VCE > >>>> on suspend > >>>> > >>>> > >>>> > >>>> On 8/17/2021 12:10 PM, Evan Quan wrote: > >>>>> If the powergating of UVD/VCE is in process, wait for its > >>>>> completion before proceeding(suspending). This can fix some hangs > >>>>> observed on suspending when UVD/VCE still using(e.g. issue > >>>>> "pm-suspend" when video is still playing). > >>>>> > >>>>> Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed > >>>>> Signed-off-by: Evan Quan <evan.quan@xxxxxxx> > >>>>> Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx> > >>>>> --- > >>>>> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 5 +++++ > >>>>> drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 5 +++++ > >>>>> 2 files changed, 10 insertions(+) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c > >>>>> index 4eebf973a065..2fdce572baeb 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c > >>>>> @@ -554,6 +554,11 @@ static int uvd_v6_0_suspend(void *handle) > >>>>> int r; > >>>>> struct amdgpu_device *adev = (struct amdgpu_device > *)handle; > >>>>> > >>>>> + /* > >>>>> + * If the powergating is in process, wait for its completion. > >>>>> + */ > >>>>> + flush_delayed_work(&adev->uvd.idle_work); > >>>>> + > >>>> If running idle is a prerequisite before going to suspend, then > >>>> something else is missing here. > >>>> > >>>> Otherwise, the hang looks more like a pending work launched after > >>>> hardware is suspended and trying to access hardware. As the > >>>> hardware is going to be suspended anyway, doesn't it work with > >>>> cancel_delayed_work_sync - making sure that nothing is going to be > >>>> launched later to access hardware? > >>> [Quan, Evan] The reason we chose flush_delayed_work instead of > >> cancel_delayed_work_sync is we think those operations performed in > >> idle_work(dpm disablement, powergating) seems needed considering > the > >> action is 'suspend'. So, instead of "cancel", maybe waiting for them > >> completion is more proper. > >> > >> But it will do so only if the work is scheduled - so it doesn't seem > >> to be a prerequisite for suspend. If it was a prerequisite, then the > >> existing code is missing that (so that it gets done for all cases). > > [Quan, Evan] Just confirmed that cancel_delayed_work_sync() alone > cannot work. > > In fact, our current driver already get cancel_delayed_work_sync() > called(e.g. in amdgpu_uvd_suspend() on the path of uvd_v6_0_suspend). > > Thanks Evan for checking this further. > > If cancel_delayed_work_sync() is called in amdgpu_uvd_suspend(), then it > means that originally executing idle_work was not considered as a > prerequisite for suspending. > > _uvd_suspend() is called "after" uvd_v6_0_hw_fini(adev), that maybe a little > too late. > > > To get the issue fixed, it has to be either: > > 1. flush_delayed_work() > > Or > > 2. cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce(adev, > false) > > (the job performed in idle work) > > At minimum, it proves that disabling dpm is required for suspend. > > > Btw, I do not think flush_delayed_work() is an incomplete fix. Since the > UVD/VCE idle work is appended on ring->funcs->end_use. > > So, as long as the UVD/VCE ring used and ended(it will since at least there is > ring/ib test), there will be a chance to get the work waited and completed. > > I agree that it fixes the issue, only thing is someone should look further to > provide the right sequence of uvd suspend. > > It doesn't give a root cause/right sequence - it only tells that forcing to > execute idle_work fixes the problem probably due to an extra delay added > by increased execution time or it disables DPM or something else. > Someone should confirm that all of idle_work or a part of it (as dpm > disable) is required for proper suspend sequence. > > That said, I don't have any objections to this fix either. If there are other > things, probably fix it this way and move on. [Quan, Evan] Thanks Lijo. I checked more yesterday. I found the issue was not seen on sienna cichlid and I checked its related source code(vcn_v3_0.c). I think I might got some clues what's the right sequence and the reason why adding amdgpu_dpm_enable_uvd/vce(adev,false) can address the issue. For vcn_v3_0.c, the cleanups performed on suspend: - cancel_delayed_work_sync - vcn_v3_0_set_powergating_state() - There is actually more jobs performed than just powergating. The naming is a little confusing. - what it does are: powergating enablement + clockgating enablement + dpm disablement For uvd_v6_0.c(where the issue was found), the above cleanups are missing. But if "cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce(adev," added, the sequence will be just like above. That might explain why it works. - cancel_delayed_work_sync - amdgpu_dpm_enable_uvd/vce() - Again, the naming is confusing. It actually performs more than just dpm disablement/enablmenet. - What it does are: powergating enablement + clockgating enablement + dpm disablement So, to me it seems proper to introduce a quick fix by " cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce()". A more proper long-term solution may be as the latest VCN(to put all the necessary cleanups in ->set_powergating_state()). But that will affect more than S3. I will leave that to Leo or James who are experts on UVD/VCE/VCN relates. BR Evan > false) > > Thanks, > Lijo > > > > > BR > > Evan > >> > >>>> > >>>> Then this may be a potential issue for other suspend calls also > >>>> where work is pending to be launched when hardware is suspended. > >>> [Quan, Evan] Do you mean we need to check whether there is similar > >>> issue > >> for other IPs? > >>> > >> > >> Yes, if there are cases where other IPs may schedule a delayed work > >> and call hw_fini without cancelling the work. > >> > >> Thanks, > >> Lijo > >> > >>> BR > >>> Evan > >>>> > >>>> Thanks, > >>>> Lijo > >>>> > >>>>> r = uvd_v6_0_hw_fini(adev); > >>>>> if (r) > >>>>> return r; > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c > >>>>> index 6d9108fa22e0..f0adecd5ec0b 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c > >>>>> @@ -503,6 +503,11 @@ static int vce_v3_0_suspend(void *handle) > >>>>> int r; > >>>>> struct amdgpu_device *adev = (struct amdgpu_device > *)handle; > >>>>> > >>>>> + /* > >>>>> + * If the powergating is in process, wait for its completion. > >>>>> + */ > >>>>> + flush_delayed_work(&adev->vce.idle_work); > >>>>> + > >>>>> r = vce_v3_0_hw_fini(adev); > >>>>> if (r) > >>>>> return r; > >>>>>