[PATCH] drm/amdgpu: Fix S3 resume failre.

Andrey.Grodzovsky@xxxxxxx (Andrey Grodzovsky) · Thu, 19 Jul 2018 16:17:46 -0400

On 07/19/2018 03:37 PM, Harry Wentland wrote:
> On 2018-07-19 02:30 PM, Alex Deucher wrote:
>> On Thu, Jul 19, 2018 at 1:07 PM, Andrey Grodzovsky
>> <Andrey.Grodzovsky at amd.com> wrote:
>>>
>>> On 07/19/2018 12:59 PM, Michel DÃ¤nzer wrote:
>>>> On 2018-07-19 06:53 PM, Andrey Grodzovsky wrote:
>>>>>
>>>>> On 07/19/2018 12:47 PM, Michel DÃ¤nzer wrote:
>>>>>> On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote:
>>>>>>> On 07/19/2018 11:39 AM, Ville SyrjÃ¤lÃ¤ wrote:
>>>>>>>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>> Problem:
>>>>>>>>> FB is still not unpinned during the first run of amdgpu_bo_evict_vram
>>>>>>>>> and so it's left for the second run, but during second run the SDMA
>>>>>>>>> for
>>>>>>>>> moving buffer around already disabled and you have to do
>>>>>>>>> it with CPU, but FB is not in visible VRAM and hence the eviction
>>>>>>>>> failure
>>>>>>>>> leading later to resume failure.
>>>>>>>>>
>>>>>>>>> Fix:
>>>>>>>>> When DAL in use get a pointer to FB from crtc->primary->state rather
>>>>>>>>> then from crtc->primary which is not set for DAL since it supports
>>>>>>>>> atomic KMS.
>>>>>>>>>
>>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
>>>>>>>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic
>>>>>>>>> drivers
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>>>>>>>>      1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> index 709e4a3..dd9ebf7 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct drm_device
>>>>>>>>> *dev, bool suspend, bool fbcon)
>>>>>>>>>          /* unpin the front buffers and cursors */
>>>>>>>>>          list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
>>>>>>>>> {
>>>>>>>>>              struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>>>>>>>>> -        struct drm_framebuffer *fb = crtc->primary->fb;
>>>>>>>>> +         struct drm_framebuffer *fb =
>>>>>>>>> amdgpu_device_has_dc_support(adev) ?
>>>>>>>>> +                 crtc->primary->state->fb : crtc->primary->fb;
>>>>>>>> So apparently you haven't yet turned off the planes here. If I'm
>>>>>>>> reading things right amdgpu_device_ip_suspend() should end up doing
>>>>>>>> that through drm_atomic_helper_suspend(). So it looks like like now
>>>>>>>> you'll end up unpinning the same bos twice. Doesn't that mess up
>>>>>>>> some kind of refcount or something?
>>>>>>> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve is less
>>>>>>> clear.
>>>>>> BO reservation shouldn't an issue here, BOs are only reserved for a
>>>>>> short time around (un)pinning them.
>>>>>>
>>>>>>
>>>>>>>> To me it would seem better to susped the display before trying
>>>>>>>> to evict the bos.
>>>>>>> Yea, i was aware of that and indeed DAL shouldn't rely on the code in
>>>>>>> amdgpu_device_suspend to unpin
>>>>>>> front buffer and cursor since the atomic helper should do it. Problem
>>>>>>> is
>>>>>>> that during amdgpu_device_ip_suspend
>>>>>>> the SDMA engine gets suspended too, so you have to embed another
>>>>>>> eviction in between, after display is suspended but before
>>>>>>> SDMA and this forces ordering between them which kind of already in
>>>>>>> place (amd_ip_block_type) but still it's an extra constrain.
>>>>>> Ville's point (which I basically agree with) is that the display
>>>>>> hardware should be turned off before evicting VRAM the first time, in
>>>>>> which case no second eviction should be necessary (for this purpose).
>>>>> Display HW is turned off as part of all IPs in a loop inside
>>>>> amdgpu_device_ip_suspend.
>>>>> Are you suggesting to extract the  display HW turn off from inside
>>>>> amdgpu_device_ip_suspend and place it
>>>>> before the first call to amdgpu_bo_evict_vram ?
>>>> In a nutshell, yes.
>>>>
>>>> Or maybe it would be easier to move the amdgpu_bo_evict_vram call down
>>>> to somewhere called from amdgpu_device_ip_suspend?
>>>
>>> I can move the BEFORE and AFTER calls to amdgpu_bo_evict_vram inside
>>> amdgpu_device_ip_suspend
>>> such that the first one is called AFTER display is shut off, while the
>>> second in the very end of the function.
>>> I am just not sure what's gonna be the side effect of evicting after bunch
>>> of blocks (such as GMC) are already disabled.
>> How about something like the attached patches?  Basically split the ip
>> suspend sequence in two like we do for resume.
>>
> Patches are
> Acked-by: Harry Wentland <harry.wentland at amd.com>
>
> Harry
>
>> Alex

Patches look good indeed but on second S3 in a raw i get a warning in 
dma_fence_is_later
about fence contexts not equal. I will have to take a look why is that.

Andrey

>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx