On 07/19/2018 04:17 PM, Andrey Grodzovsky wrote: > > > On 07/19/2018 03:37 PM, Harry Wentland wrote: >> On 2018-07-19 02:30 PM, Alex Deucher wrote: >>> On Thu, Jul 19, 2018 at 1:07 PM, Andrey Grodzovsky >>> <Andrey.Grodzovsky at amd.com> wrote: >>>> >>>> On 07/19/2018 12:59 PM, Michel Dänzer wrote: >>>>> On 2018-07-19 06:53 PM, Andrey Grodzovsky wrote: >>>>>> >>>>>> On 07/19/2018 12:47 PM, Michel Dänzer wrote: >>>>>>> On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote: >>>>>>>> On 07/19/2018 11:39 AM, Ville Syrjälä wrote: >>>>>>>>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky >>>>>>>>> wrote: >>>>>>>>>> Problem: >>>>>>>>>> FB is still not unpinned during the first run of >>>>>>>>>> amdgpu_bo_evict_vram >>>>>>>>>> and so it's left for the second run, but during second run >>>>>>>>>> the SDMA >>>>>>>>>> for >>>>>>>>>> moving buffer around already disabled and you have to do >>>>>>>>>> it with CPU, but FB is not in visible VRAM and hence the >>>>>>>>>> eviction >>>>>>>>>> failure >>>>>>>>>> leading later to resume failure. >>>>>>>>>> >>>>>>>>>> Fix: >>>>>>>>>> When DAL in use get a pointer to FB from crtc->primary->state >>>>>>>>>> rather >>>>>>>>>> then from crtc->primary which is not set for DAL since it >>>>>>>>>> supports >>>>>>>>>> atomic KMS. >>>>>>>>>> >>>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065 >>>>>>>>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic >>>>>>>>>> drivers >>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com> >>>>>>>>>> --- >>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- >>>>>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> index 709e4a3..dd9ebf7 100644 >>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct >>>>>>>>>> drm_device >>>>>>>>>> *dev, bool suspend, bool fbcon) >>>>>>>>>>         /* unpin the front buffers and cursors */ >>>>>>>>>>         list_for_each_entry(crtc, >>>>>>>>>> &dev->mode_config.crtc_list, head) >>>>>>>>>> { >>>>>>>>>>             struct amdgpu_crtc *amdgpu_crtc = >>>>>>>>>> to_amdgpu_crtc(crtc); >>>>>>>>>> -       struct drm_framebuffer *fb = crtc->primary->fb; >>>>>>>>>> +        struct drm_framebuffer *fb = >>>>>>>>>> amdgpu_device_has_dc_support(adev) ? >>>>>>>>>> + crtc->primary->state->fb : crtc->primary->fb; >>>>>>>>> So apparently you haven't yet turned off the planes here. If I'm >>>>>>>>> reading things right amdgpu_device_ip_suspend() should end up >>>>>>>>> doing >>>>>>>>> that through drm_atomic_helper_suspend(). So it looks like >>>>>>>>> like now >>>>>>>>> you'll end up unpinning the same bos twice. Doesn't that mess up >>>>>>>>> some kind of refcount or something? >>>>>>>> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve >>>>>>>> is less >>>>>>>> clear. >>>>>>> BO reservation shouldn't an issue here, BOs are only reserved for a >>>>>>> short time around (un)pinning them. >>>>>>> >>>>>>> >>>>>>>>> To me it would seem better to susped the display before trying >>>>>>>>> to evict the bos. >>>>>>>> Yea, i was aware of that and indeed DAL shouldn't rely on the >>>>>>>> code in >>>>>>>> amdgpu_device_suspend to unpin >>>>>>>> front buffer and cursor since the atomic helper should do it. >>>>>>>> Problem >>>>>>>> is >>>>>>>> that during amdgpu_device_ip_suspend >>>>>>>> the SDMA engine gets suspended too, so you have to embed another >>>>>>>> eviction in between, after display is suspended but before >>>>>>>> SDMA and this forces ordering between them which kind of >>>>>>>> already in >>>>>>>> place (amd_ip_block_type) but still it's an extra constrain. >>>>>>> Ville's point (which I basically agree with) is that the display >>>>>>> hardware should be turned off before evicting VRAM the first >>>>>>> time, in >>>>>>> which case no second eviction should be necessary (for this >>>>>>> purpose). >>>>>> Display HW is turned off as part of all IPs in a loop inside >>>>>> amdgpu_device_ip_suspend. >>>>>> Are you suggesting to extract the display HW turn off from inside >>>>>> amdgpu_device_ip_suspend and place it >>>>>> before the first call to amdgpu_bo_evict_vram ? >>>>> In a nutshell, yes. >>>>> >>>>> Or maybe it would be easier to move the amdgpu_bo_evict_vram call >>>>> down >>>>> to somewhere called from amdgpu_device_ip_suspend? >>>> >>>> I can move the BEFORE and AFTER calls to amdgpu_bo_evict_vram inside >>>> amdgpu_device_ip_suspend >>>> such that the first one is called AFTER display is shut off, while the >>>> second in the very end of the function. >>>> I am just not sure what's gonna be the side effect of evicting >>>> after bunch >>>> of blocks (such as GMC) are already disabled. >>> How about something like the attached patches? Basically split the ip >>> suspend sequence in two like we do for resume. >>> >> Patches are >> Acked-by: Harry Wentland <harry.wentland at amd.com> >> >> Harry >> >>> Alex > > Patches look good indeed but on second S3 in a raw i get a warning in > dma_fence_is_later > about fence contexts not equal. I will have to take a look why is that. > > Andrey Seems like amdgpu_ttm_set_buffer_funcs_status destroys adev->mman.entity on suspend without releasing adev->mman.bdev.man[TTM_PL_VRAM].move fence so on resume the new drm_sched_entity.fence_context causes the warning against the old fence context which is different. BTW, happens with my original change to, I just haven't noticed. Andrey > >>> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >