[Bug 111229] Unable to unbind GPU from amdgpu

bugzilla-daemon@xxxxxxxxxxxxxxx · Mon, 21 Oct 2019 07:22:46 +0000

            Comment # 11
              on bug 111229
              from  Eugene Shatsky

        Since last comment I've used this for a dozen times for switching between Linux
desktop and Windows VM, one time amdgpu crashed after resume from suspend but
I'm not sure if it was related to this bug and I was still able to reboot after
it.
However I still get this warning sometimes on unbind:

WARNING: CPU: 0 PID: 1109 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:929
amdgpu_bo_unpin+0xc8/0xf0 [amdgpu]
Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio fuse amdgpu
amd_iommu_v2 gpu_sched ttm xt_CHECKSUM xt_MASQUERADE ipt_REJECT nf_rejec>
 nf_conntrack nf_defrag_ipv4 libcrc32c zsmalloc ip6t_rpfilter ipt_rpfilter
ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_comm>
CPU: 0 PID: 1109 Comm: .libvirtd-wrapp Tainted: G           O      5.3.0-rc7
#1-NixOS
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H61M-DGS R2.0,
BIOS P1.10 10/01/2013
RIP: 0010:amdgpu_bo_unpin+0xc8/0xf0 [amdgpu]
Code: ff 48 83 c0 0c 48 39 d0 75 ea 48 8d 73 30 48 8d 7b 50 48 8d 54 24 08 e8
46 1f d8 ff 85 c0 74 a1 e9 30 6c 21 00 e8 28 f9 6b f5 <0f> 0b 48 8b >
RSP: 0018:ffffa4df00a4bd28 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8c60449a4800 RCX: 0000000000000002
RDX: ffff8c60423c9b00 RSI: 0000000000000000 RDI: ffff8c60449a4800
RBP: ffff8c6008fa4058 R08: 0000000000000000 R09: ffffffffc0b3c000
R10: ffff8c60449a2800 R11: 0000000000000001 R12: ffff8c6008fa6378
R13: ffff8c6008fa6370 R14: ffff8c6008fa4058 R15: ffff8c6008d7f260
FS:  00007fac9a81f700(0000) GS:ffff8c605f400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffea51ccff8 CR3: 00000004048c4003 CR4: 00000000001606f0
Call Trace:
 amdgpu_bo_free_kernel+0x6b/0x120 [amdgpu]
 amdgpu_gfx_rlc_fini+0x47/0x70 [amdgpu]
 gfx_v8_0_sw_fini+0xa1/0x1a0 [amdgpu]
 amdgpu_device_fini+0x257/0x479 [amdgpu]
 amdgpu_driver_unload_kms+0x4a/0x90 [amdgpu]
 drm_dev_unregister+0x4b/0xb0 [drm]
 amdgpu_pci_remove+0x25/0x50 [amdgpu]
 pci_device_remove+0x3b/0xc0
 device_release_driver_internal+0xd8/0x1b0
 unbind_store+0x94/0x120
 kernfs_fop_write+0x108/0x190
 vfs_write+0xa5/0x1a0
 ksys_write+0x59/0xd0
 do_syscall_64+0x4e/0x120
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7faca4a7b36f
Code: 1f 40 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 53 fd ff ff
4c 89 e2 41 89 c0 48 89 ee 89 df b8 01 00 00 00 0f 05 <48> 3d 00 f0 >
RSP: 002b:00007fac9a81e4d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007faca4a7b36f
RDX: 000000000000000c RSI: 00007fac84019a20 RDI: 0000000000000012
RBP: 00007fac84019a20 R08: 0000000000000000 R09: 000000000000002f
R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000c
R13: 0000000000000000 R14: 0000000000000012 R15: 00007fac9a81e568
---[ end trace ffd153eee3d00ec4 ]---
amdgpu 0000:01:00.0: 00000000001146cc unpin not necessary

It's produced by
https://github.com/torvalds/linux/blob/574cc4539762561d96b456dbc0544d8898bd4c6e/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c#L937
, I wonder if buffer object pin count is something like reference count

Also it looks like the message

*ERROR* Device removal is currently not supported outside of fbcon

is printed non-conditionally, without checking if DRM nodes are being used by
userspace clients. I wonder if it's possible to implement such a check and
prevent the unbind if they are

      You are receiving this mail because:

          You are the assignee for the bug.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel