[PATCH] drm/amdgpu: Fix S3 resume failre.

alexdeucher@xxxxxxxxx (Alex Deucher) · Thu, 19 Jul 2018 14:30:05 -0400

On Thu, Jul 19, 2018 at 1:07 PM, Andrey Grodzovsky
<Andrey.Grodzovsky at amd.com> wrote:
>
>
> On 07/19/2018 12:59 PM, Michel DÃ¤nzer wrote:
>>
>> On 2018-07-19 06:53 PM, Andrey Grodzovsky wrote:
>>>
>>>
>>> On 07/19/2018 12:47 PM, Michel DÃ¤nzer wrote:
>>>>
>>>> On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote:
>>>>>
>>>>> On 07/19/2018 11:39 AM, Ville SyrjÃ¤lÃ¤ wrote:
>>>>>>
>>>>>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky wrote:
>>>>>>>
>>>>>>> Problem:
>>>>>>> FB is still not unpinned during the first run of amdgpu_bo_evict_vram
>>>>>>> and so it's left for the second run, but during second run the SDMA
>>>>>>> for
>>>>>>> moving buffer around already disabled and you have to do
>>>>>>> it with CPU, but FB is not in visible VRAM and hence the eviction
>>>>>>> failure
>>>>>>> leading later to resume failure.
>>>>>>>
>>>>>>> Fix:
>>>>>>> When DAL in use get a pointer to FB from crtc->primary->state rather
>>>>>>> then from crtc->primary which is not set for DAL since it supports
>>>>>>> atomic KMS.
>>>>>>>
>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
>>>>>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic
>>>>>>> drivers
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> index 709e4a3..dd9ebf7 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct drm_device
>>>>>>> *dev, bool suspend, bool fbcon)
>>>>>>>         /* unpin the front buffers and cursors */
>>>>>>>         list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
>>>>>>> {
>>>>>>>             struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>>>>>>> -        struct drm_framebuffer *fb = crtc->primary->fb;
>>>>>>> +         struct drm_framebuffer *fb =
>>>>>>> amdgpu_device_has_dc_support(adev) ?
>>>>>>> +                 crtc->primary->state->fb : crtc->primary->fb;
>>>>>>
>>>>>> So apparently you haven't yet turned off the planes here. If I'm
>>>>>> reading things right amdgpu_device_ip_suspend() should end up doing
>>>>>> that through drm_atomic_helper_suspend(). So it looks like like now
>>>>>> you'll end up unpinning the same bos twice. Doesn't that mess up
>>>>>> some kind of refcount or something?
>>>>>
>>>>> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve is less
>>>>> clear.
>>>>
>>>> BO reservation shouldn't an issue here, BOs are only reserved for a
>>>> short time around (un)pinning them.
>>>>
>>>>
>>>>>> To me it would seem better to susped the display before trying
>>>>>> to evict the bos.
>>>>>
>>>>> Yea, i was aware of that and indeed DAL shouldn't rely on the code in
>>>>> amdgpu_device_suspend to unpin
>>>>> front buffer and cursor since the atomic helper should do it. Problem
>>>>> is
>>>>> that during amdgpu_device_ip_suspend
>>>>> the SDMA engine gets suspended too, so you have to embed another
>>>>> eviction in between, after display is suspended but before
>>>>> SDMA and this forces ordering between them which kind of already in
>>>>> place (amd_ip_block_type) but still it's an extra constrain.
>>>>
>>>> Ville's point (which I basically agree with) is that the display
>>>> hardware should be turned off before evicting VRAM the first time, in
>>>> which case no second eviction should be necessary (for this purpose).
>>>
>>> Display HW is turned off as part of all IPs in a loop inside
>>> amdgpu_device_ip_suspend.
>>> Are you suggesting to extract the  display HW turn off from inside
>>> amdgpu_device_ip_suspend and place it
>>> before the first call to amdgpu_bo_evict_vram ?
>>
>> In a nutshell, yes.
>>
>> Or maybe it would be easier to move the amdgpu_bo_evict_vram call down
>> to somewhere called from amdgpu_device_ip_suspend?
>
>
> I can move the BEFORE and AFTER calls to amdgpu_bo_evict_vram inside
> amdgpu_device_ip_suspend
> such that the first one is called AFTER display is shut off, while the
> second in the very end of the function.
> I am just not sure what's gonna be the side effect of evicting after bunch
> of blocks (such as GMC) are already disabled.

How about something like the attached patches?  Basically split the ip
suspend sequence in two like we do for resume.

Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-drm-amdgpu-rework-suspend-and-resume-to-deal-with-at.patch
Type: text/x-patch
Size: 4598 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180719/2783c206/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-amdgpu-split-ip-suspend-into-2-phases.patch
Type: text/x-patch
Size: 4412 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180719/2783c206/attachment-0001.bin>