Hi
I've just noticed something similar when starting weston, I still see it with this patch, but not on linus's tree
I'll confirm for sure tomorrow and send the stack trace if I can save it
Cheers
Mike
On Tue, 3 Aug 2021 at 02:56, Chen, Guchun <Guchun.Chen@xxxxxxx> wrote:
[Public]
Hi Alex,
I submitted the patch before your message, I will take care of this next time.
Regards,
Guchun
-----Original Message-----
From: Alex Deucher <alexdeucher@xxxxxxxxx>
Sent: Monday, August 2, 2021 9:35 PM
To: Chen, Guchun <Guchun.Chen@xxxxxxx>
Cc: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Gao, Likun <Likun.Gao@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)
On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun <Guchun.Chen@xxxxxxx> wrote:
>
> [Public]
>
> Thank you, Christian.
>
> Regarding fence_drv.initialized, it looks to a bit redundant, anyway let me look into this more.
Does this patch fix this bug?
https://nam11.safelinks.protection.outlook.com/?url="">
If so, please add:
Bug: https://nam11.safelinks.protection.outlook.com/?url="">
to the commit message.
Alex
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
> Sent: Monday, August 2, 2021 2:56 PM
> To: Chen, Guchun <Guchun.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
> Gao, Likun <Likun.Gao@xxxxxxx>; Koenig, Christian
> <Christian.Koenig@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>;
> Deucher, Alexander <Alexander.Deucher@xxxxxxx>
> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver
> fini in s3 test (v2)
>
> Am 02.08.21 um 07:16 schrieb Guchun Chen:
> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to
> > stop scheduler in s3 test, otherwise, fence related failure will
> > arrive after resume. To fix this and for a better clean up, move
> > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of
> > driver shutdown, and should never be called in hw_fini.
> >
> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
> > to keep sw_init and sw_fini paired.
> >
> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > Suggested-by: Christian König <christian.koenig@xxxxxxx>
> > Signed-off-by: Guchun Chen <guchun.chen@xxxxxxx>
>
> It's a bit ambiguous now what fence_drv.initialized means, but I think we can live with that for now.
>
> Patch is Reviewed-by: Christian König <christian.koenig@xxxxxxx>.
>
> Regards,
> Christian.
>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 ++---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 12 +++++++-----
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 4 ++--
> > 3 files changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index b1d2dc39e8be..9e53ff851496 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device
> > *adev,
> >
> > fence_driver_init:
> > /* Fence driver */
> > - r = amdgpu_fence_driver_init(adev);
> > + r = amdgpu_fence_driver_sw_init(adev);
> > if (r) {
> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init
> > + failed\n");
> > amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 0);
> > goto failed;
> > }
> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
> > }
> > amdgpu_fence_driver_hw_init(adev);
> >
> > -
> > r = amdgpu_device_ip_late_init(adev);
> > if (r)
> > return r;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 49c5c7331c53..7495911516c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> > }
> >
> > /**
> > - * amdgpu_fence_driver_init - init the fence driver
> > + * amdgpu_fence_driver_sw_init - init the fence driver
> > * for all possible rings.
> > *
> > * @adev: amdgpu device pointer
> > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> > * amdgpu_fence_driver_start_ring().
> > * Returns 0 for success.
> > */
> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
> > {
> > return 0;
> > }
> >
> > /**
> > - * amdgpu_fence_driver_fini - tear down the fence driver
> > + * amdgpu_fence_driver_hw_fini - tear down the fence driver
> > * for all possible rings.
> > *
> > * @adev: amdgpu device pointer
> > @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct
> > amdgpu_device *adev)
> >
> > if (!ring || !ring->fence_drv.initialized)
> > continue;
> > - if (!ring->no_scheduler)
> > - drm_sched_fini(&ring->sched);
> > +
> > /* You can't wait for HW to signal if it's gone */
> > if (!drm_dev_is_unplugged(&adev->ddev))
> > r = amdgpu_fence_wait_empty(ring); @@ -560,6
> > +559,9 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
> > if (!ring || !ring->fence_drv.initialized)
> > continue;
> >
> > + if (!ring->no_scheduler)
> > + drm_sched_fini(&ring->sched);
> > +
> > for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
> > dma_fence_put(ring->fence_drv.fences[j]);
> > kfree(ring->fence_drv.fences); diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index 27adffa7658d..9c11ced4312c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > @@ -106,7 +106,6 @@ struct amdgpu_fence_driver {
> > struct dma_fence **fences;
> > };
> >
> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev);
> > void amdgpu_fence_driver_force_completion(struct amdgpu_ring
> > *ring);
> >
> > int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, @@
> > -115,9 +114,10 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> > int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
> > struct amdgpu_irq_src *irq_src,
> > unsigned irq_type);
> > +void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
> > void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev);
> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev);
> > void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev);
> > -void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
> > int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **fence,
> > unsigned flags);
> > int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s,