Re: [PATCH 5.15.y] drm/amdgpu: Set vmbo destroy after pt bo is created

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 28, 2023 at 06:16:36AM -0500, Mario Limonciello wrote:
> From: Philip Yang <Philip.Yang@xxxxxxx>
> 
> Under VRAM usage pression, map to GPU may fail to create pt bo and
> vmbo->shadow_list is not initialized, then ttm_bo_release calling
> amdgpu_bo_vm_destroy to access vmbo->shadow_list generates below
> dmesg and NULL pointer access backtrace:
> 
> Set vmbo destroy callback to amdgpu_bo_vm_destroy only after creating pt
> bo successfully, otherwise use default callback amdgpu_bo_destroy.
> 
> amdgpu: amdgpu_vm_bo_update failed
> amdgpu: update_gpuvm_pte() failed
> amdgpu: Failed to map bo to gpuvm
> amdgpu 0000:43:00.0: amdgpu: Failed to map peer:0000:43:00.0 mem_domain:2
> BUG: kernel NULL pointer dereference, address:
>  RIP: 0010:amdgpu_bo_vm_destroy+0x4d/0x80 [amdgpu]
>  Call Trace:
>   <TASK>
>   ttm_bo_release+0x207/0x320 [amdttm]
>   amdttm_bo_init_reserved+0x1d6/0x210 [amdttm]
>   amdgpu_bo_create+0x1ba/0x520 [amdgpu]
>   amdgpu_bo_create_vm+0x3a/0x80 [amdgpu]
>   amdgpu_vm_pt_create+0xde/0x270 [amdgpu]
>   amdgpu_vm_ptes_update+0x63b/0x710 [amdgpu]
>   amdgpu_vm_update_range+0x2e7/0x6e0 [amdgpu]
>   amdgpu_vm_bo_update+0x2bd/0x600 [amdgpu]
>   update_gpuvm_pte+0x160/0x420 [amdgpu]
>   amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x313/0x1130 [amdgpu]
>   kfd_ioctl_map_memory_to_gpu+0x115/0x390 [amdgpu]
>   kfd_ioctl+0x24a/0x5b0 [amdgpu]
> 
> Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx>
> Reviewed-by: Christian König <christian.koenig@xxxxxxx>
> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
> 
> (cherry picked from commit 9a3c6067bd2ee2ca2652fbb0679f422f3c9109f9)
> 
> This fixes a regression introduced by commit 1cc40dccad76 ("drm/amdgpu:
> fix Null pointer dereference error in amdgpu_device_recover_vram") in
> 5.15.118. It's a hand modified cherry-pick because that commit that
> introduced the regression touched nearby code and the context is now
> incorrect.

Now queued up, thanks.

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux