> In this case I think it would be much better to wait for the idle work before trying to unload the driver. I already did it: > + if (!amdgpu_sriov_vf(adev)) > + while (cancel_delayed_work_sync(&adev->late_init_work)) > + schedule(); /* to make sure late_init_work really stopped */ What do you mean never call "schedule()" this way ? Please show me how to do it to achieve the same goal and I can modify my patch /Monk -----Original Message----- From: Christian König [mailto:ckoenig.leichtzumerken@xxxxxxxxx] Sent: 2018å¹´2æ??26æ?¥ 18:08 To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org Subject: Re: [PATCH 09/22] drm/amdgpu: fix kmd reload bug on bare-metal Am 26.02.2018 um 06:18 schrieb Monk Liu: > issue: > on bare-metal when doing kmd reload test, there are chance that kernel > hit fatal error afer driver unloaded/reloaded > > fix: > the cause is that those "idle work" not really stopped and if kmd was > is unloaded too quick that were chance that "idle work" run after > driver structures already released which introduces this issue. In this case I think it would be much better to wait for the idle work before trying to unload the driver. > > Change-Id: Idb0f7db771e7ca60dba925d1d0f48b1de08dc89e > Signed-off-by: Monk Liu <Monk.Liu at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 +++- > 3 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 54145ec..69fb5e50 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -1419,7 +1419,8 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) > } > } > > - mod_delayed_work(system_wq, &adev->late_init_work, > + if (!amdgpu_sriov_vf(adev)) > + mod_delayed_work(system_wq, &adev->late_init_work, > msecs_to_jiffies(AMDGPU_RESUME_MS)); > > amdgpu_device_fill_reset_magic(adev); > @@ -2087,7 +2088,11 @@ void amdgpu_device_fini(struct amdgpu_device *adev) > adev->firmware.gpu_info_fw = NULL; > } > adev->accel_working = false; > - cancel_delayed_work_sync(&adev->late_init_work); > + > + if (!amdgpu_sriov_vf(adev)) > + while (cancel_delayed_work_sync(&adev->late_init_work)) > + schedule(); /* to make sure late_init_work really stopped */ Never use schedule() like that. Regards, Christian. > + > /* free i2c buses */ > if (!amdgpu_device_has_dc_support(adev)) > amdgpu_i2c_fini(adev); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c > index caba610..337db57 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c > @@ -299,7 +299,8 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev) > return 0; > > if (!amdgpu_sriov_vf(adev)) > - cancel_delayed_work_sync(&adev->uvd.idle_work); > + while (cancel_delayed_work_sync(&adev->uvd.idle_work)) > + schedule(); /* to make sure idle work really stopped */ > > for (i = 0; i < adev->uvd.max_handles; ++i) > if (atomic_read(&adev->uvd.handles[i])) > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c > index a829350..2874fda 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c > @@ -243,7 +243,9 @@ int amdgpu_vce_suspend(struct amdgpu_device *adev) > return 0; > > if (!amdgpu_sriov_vf(adev)) > - cancel_delayed_work_sync(&adev->vce.idle_work); > + while (cancel_delayed_work_sync(&adev->vce.idle_work)) > + schedule(); /* to make sure the idle_work really stopped */ > + > /* TODO: suspending running encoding sessions isn't supported */ > return -EINVAL; > }