Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I suspect the issue is that amdgpu_dm_do_flip is holding the BO reserved 
and then stack waiting for fences to signal in 
reservation_object_wait_timeout_rcu (which won't signal because there 
was a VM_FAULT). Then when we try to shutdown display block during reset 
recovery from drm_atomic_helper_suspend we also try to reserve the BO,  
probably from dm_plane_helper_cleanup_fb ending in deadlock.

To confirm i am attaching some printks around the BO reservation - 
please apply and rerun.

Also, probably a good idea to open FDO ticket on this instead of using 
amd-gfx.

Andrey


On 2/12/19 10:49 AM, Mikhail Gavrilov wrote:
> On Tue, 12 Feb 2019 at 20:23, Grodzovsky, Andrey
> <Andrey.Grodzovsky@xxxxxxx> wrote:
>> It should recover you - so this looks like a bug. I noticed in one of
>> the call traces this - drm_atomic_helper_suspend which points to system
>> going into sleep mode, is it what happened, did it hang when system
>> tried to sleep ?
>>
> It's weird because the computer was not enter in sleep mode. I am sure.
> Steps for reproduce:
> 1. Launch Shadow of The tomb Rider on Proton2. Wait some time until mouse stop respond
> 3. Dump gfx, waves and all other dumps including dmesg
>
> And of course the power button (button which enter in sleep mode) was
> not pressed.
>
> So the new dumps has any new useful info? Or they are pointless?
> --
> Best Regards,
> Mike Gavrilov.
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d59bafc..e15cd3c 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2353,6 +2353,8 @@ static int get_fb_info(const struct amdgpu_framebuffer *amdgpu_fb,
                       uint64_t *tiling_flags)
 {
        struct amdgpu_bo *rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]);
+
+       DRM_ERROR("Before %p\n",rbo);
        int r = amdgpu_bo_reserve(rbo, false);
 
        if (unlikely(r)) {
@@ -2362,6 +2364,8 @@ static int get_fb_info(const struct amdgpu_framebuffer *amdgpu_fb,
                return r;
        }
 
+       DRM_ERROR("After %p\n",rbo);
+
        if (tiling_flags)
                amdgpu_bo_get_tiling_flags(rbo, tiling_flags);
 
@@ -3715,9 +3719,11 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
        obj = new_state->fb->obj[0];
        rbo = gem_to_amdgpu_bo(obj);
        adev = amdgpu_ttm_adev(rbo->tbo.bdev);
+       DRM_ERROR("Before %p\n",rbo);
        r = amdgpu_bo_reserve(rbo, false);
        if (unlikely(r != 0))
                return r;
+       DRM_ERROR("After %p\n",rbo);
 
        if (plane->type != DRM_PLANE_TYPE_CURSOR)
                domain = amdgpu_display_supported_domains(adev);
@@ -3790,11 +3796,13 @@ static void dm_plane_helper_cleanup_fb(struct drm_plane *plane,
                return;
 
        rbo = gem_to_amdgpu_bo(old_state->fb->obj[0]);
+       DRM_ERROR("Before %p\n",__LINE__);
        r = amdgpu_bo_reserve(rbo, false);
        if (unlikely(r)) {
                DRM_ERROR("failed to reserve rbo before unpin\n");
                return;
        }
+       DRM_ERROR("After %d\n",__LINE__);
 
        amdgpu_bo_unpin(rbo);
        amdgpu_bo_unreserve(rbo);
@@ -4801,15 +4809,17 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
                         * blocking commit to as per framework helpers
                         */
                        abo = gem_to_amdgpu_bo(fb->obj[0]);
+                       DRM_ERROR("Before %p\n",abo);
                        r = amdgpu_bo_reserve(abo, true);
                        if (unlikely(r != 0)) {
                                DRM_ERROR("failed to reserve buffer before flip\n");
                                WARN_ON(1);
                        }
-
+                       DRM_ERROR("After %p\n",abo);
                        /* Wait for all fences on this FB */
                        WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
-                                                                                   MAX_SCHEDULE_TIMEOUT) < 0);
+                                       msecs_to_jiffies(5000)) < 0);
+                       DRM_ERROR("After  reservation_object_wait_timeout_rcu %p\n",abo);
 
                        amdgpu_bo_get_tiling_flags(abo, &tiling_flags);

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux