Re: [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 21, 2020 at 02:03:07AM -0400, Andrey Grodzovsky wrote:
> entity->rq becomes null aftre device unplugged so just return early
> in that case.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>

That looks very deep in amdgpu internals ... how do you even get in here
after the device is fully unplugged on the sw side?

Is this amdkfd doing something stupid because entirely unaware of what
amdgpu has done? Something else? Just feels like this is just duct-taping
over a more fundamental problem, after hotunplug no one should be able to
even submit anything new, or do bo moves, or well anything really.
-Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 8d9c6fe..d252427 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -24,6 +24,7 @@
>  #include "amdgpu_job.h"
>  #include "amdgpu_object.h"
>  #include "amdgpu_trace.h"
> +#include <drm/drm_drv.h>
>  
>  #define AMDGPU_VM_SDMA_MIN_NUM_DW	256u
>  #define AMDGPU_VM_SDMA_MAX_NUM_DW	(16u * 1024u)
> @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>  	struct drm_sched_entity *entity;
>  	struct amdgpu_ring *ring;
>  	struct dma_fence *f;
> -	int r;
> +	int r, idx;
> +
> +	if (!drm_dev_enter(p->adev->ddev, &idx)) {
> +		r = -ENODEV;
> +		goto nodev;
> +	}
>  
>  	entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
>  	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
> @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>  	WARN_ON(ib->length_dw > p->num_dw_left);
>  	r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
>  	if (r)
> -		goto error;
> +		goto job_fail;
>  
>  	if (p->unlocked) {
>  		struct dma_fence *tmp = dma_fence_get(f);
> @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>  	if (fence && !p->immediate)
>  		swap(*fence, f);
>  	dma_fence_put(f);
> -	return 0;
>  
> -error:
> -	amdgpu_job_free(p->job);
> +	r = 0;
> +
> +job_fail:
> +	drm_dev_exit(idx);
> +nodev:
> +	if (r)
> +		amdgpu_job_free(p->job);
> +
>  	return r;
>  }
>  
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux