[PATCH] drm/amdgpu: Fix S3 resume failre.

harry.wentland@xxxxxxx (Harry Wentland) · Thu, 19 Jul 2018 15:37:39 -0400



On 2018-07-19 02:30 PM, Alex Deucher wrote:
> On Thu, Jul 19, 2018 at 1:07 PM, Andrey Grodzovsky
> <Andrey.Grodzovsky at amd.com> wrote:
>>
>>
>> On 07/19/2018 12:59 PM, Michel DÃ¤nzer wrote:
>>>
>>> On 2018-07-19 06:53 PM, Andrey Grodzovsky wrote:
>>>>
>>>>
>>>> On 07/19/2018 12:47 PM, Michel DÃ¤nzer wrote:
>>>>>
>>>>> On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote:
>>>>>>
>>>>>> On 07/19/2018 11:39 AM, Ville SyrjÃ¤lÃ¤ wrote:
>>>>>>>
>>>>>>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>
>>>>>>>> Problem:
>>>>>>>> FB is still not unpinned during the first run of amdgpu_bo_evict_vram
>>>>>>>> and so it's left for the second run, but during second run the SDMA
>>>>>>>> for
>>>>>>>> moving buffer around already disabled and you have to do
>>>>>>>> it with CPU, but FB is not in visible VRAM and hence the eviction
>>>>>>>> failure
>>>>>>>> leading later to resume failure.
>>>>>>>>
>>>>>>>> Fix:
>>>>>>>> When DAL in use get a pointer to FB from crtc->primary->state rather
>>>>>>>> then from crtc->primary which is not set for DAL since it supports
>>>>>>>> atomic KMS.
>>>>>>>>
>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
>>>>>>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic
>>>>>>>> drivers
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> index 709e4a3..dd9ebf7 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct drm_device
>>>>>>>> *dev, bool suspend, bool fbcon)
>>>>>>>>         /* unpin the front buffers and cursors */
>>>>>>>>         list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
>>>>>>>> {
>>>>>>>>             struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>>>>>>>> -        struct drm_framebuffer *fb = crtc->primary->fb;
>>>>>>>> +         struct drm_framebuffer *fb =
>>>>>>>> amdgpu_device_has_dc_support(adev) ?
>>>>>>>> +                 crtc->primary->state->fb : crtc->primary->fb;
>>>>>>>
>>>>>>> So apparently you haven't yet turned off the planes here. If I'm
>>>>>>> reading things right amdgpu_device_ip_suspend() should end up doing
>>>>>>> that through drm_atomic_helper_suspend(). So it looks like like now
>>>>>>> you'll end up unpinning the same bos twice. Doesn't that mess up
>>>>>>> some kind of refcount or something?
>>>>>>
>>>>>> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve is less
>>>>>> clear.
>>>>>
>>>>> BO reservation shouldn't an issue here, BOs are only reserved for a
>>>>> short time around (un)pinning them.
>>>>>
>>>>>
>>>>>>> To me it would seem better to susped the display before trying
>>>>>>> to evict the bos.
>>>>>>
>>>>>> Yea, i was aware of that and indeed DAL shouldn't rely on the code in
>>>>>> amdgpu_device_suspend to unpin
>>>>>> front buffer and cursor since the atomic helper should do it. Problem
>>>>>> is
>>>>>> that during amdgpu_device_ip_suspend
>>>>>> the SDMA engine gets suspended too, so you have to embed another
>>>>>> eviction in between, after display is suspended but before
>>>>>> SDMA and this forces ordering between them which kind of already in
>>>>>> place (amd_ip_block_type) but still it's an extra constrain.
>>>>>
>>>>> Ville's point (which I basically agree with) is that the display
>>>>> hardware should be turned off before evicting VRAM the first time, in
>>>>> which case no second eviction should be necessary (for this purpose).
>>>>
>>>> Display HW is turned off as part of all IPs in a loop inside
>>>> amdgpu_device_ip_suspend.
>>>> Are you suggesting to extract the  display HW turn off from inside
>>>> amdgpu_device_ip_suspend and place it
>>>> before the first call to amdgpu_bo_evict_vram ?
>>>
>>> In a nutshell, yes.
>>>
>>> Or maybe it would be easier to move the amdgpu_bo_evict_vram call down
>>> to somewhere called from amdgpu_device_ip_suspend?
>>
>>
>> I can move the BEFORE and AFTER calls to amdgpu_bo_evict_vram inside
>> amdgpu_device_ip_suspend
>> such that the first one is called AFTER display is shut off, while the
>> second in the very end of the function.
>> I am just not sure what's gonna be the side effect of evicting after bunch
>> of blocks (such as GMC) are already disabled.
> 
> How about something like the attached patches?  Basically split the ip
> suspend sequence in two like we do for resume.
> 

Patches are
Acked-by: Harry Wentland <harry.wentland at amd.com>

Harry

> Alex
>