DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))

felix.kuehling@xxxxxxx (Felix Kuehling) · Wed, 11 Jul 2018 20:43:58 -0400

Kent just caught a similar backtrace in one of our KFD pre-submission
tests (see below)

Neither KFD nor AMDGPU are implied in the backtrace. Is this a
regression in the kernel itself? amd-kfd-staging is currently based on
4.18-rc1.

Regards,
Â  Felix

[   19.435544] ------------[ cut here ]------------
[   19.435551] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
[   19.435558] WARNING: CPU: 2 PID: 3194 at /home/jenkins/jenkins-root/workspace/compute-psdb/kernel/kernel/locking/rwsem.c:217 up_read_non_owner+0x58/0x60
[   19.435572] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables nf_nat nf_conntrack br_netfilter fuse acpi_pad x86_pkg_temp_thermal video amdkfd amd_iommu_v2 amdgpu chash gpu_sched ttm
[   19.435598] CPU: 2 PID: 3194 Comm: correlator_test Not tainted 4.18.0-rc1-kfd-compute-psdb-22716 #1
[   19.435604] Hardware name: MSI MS-7977 <http://ontrack-internal.amd.com/browse/MS-7977>/Z170A GAMING M5 (MS-7977 <http://ontrack-internal.amd.com/browse/MS-7977>), BIOS 1.C0 10/19/2016
[   19.435611] RIP: 0010:up_read_non_owner+0x58/0x60
[   19.435615] Code: b0 00 5b c3 e8 c9 39 54 00 85 c0 74 df 8b 05 b7 72 
a1 02 85 c0 75 d5 48 c7 c6 f8 a0 32 b8 48 c7 c7 ab e9 30 b8 e8 28 e7 f9 
ff <0f> 0b eb be 0f 1f 40 00 0f 1f 44 00 00 53 48 8b 74 24 08 48 
89 fb
[   19.435661] RSP: 0018:ffffb1f0c2483c28 EFLAGS: 00010286
[   19.435666] RAX: 0000000000000000 RBX: ffff99bd19633c80 RCX: 0000000000000006
[   19.435671] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff99bd2ed158f0
[   19.435676] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   19.435682] R10: ffffb1f0c2483bc8 R11: ffffffffb70e5b0a R12: ffff99bd181c4800
[   19.435687] R13: 0000000001a59000 R14: 0000000001a58000 R15: 0000000000000000
[   19.435693] FS:  00007fae045bb700(0000) GS:ffff99bd2ed00000(0000) knlGS:0000000000000000
[   19.435699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.435704] CR2: 00007fadff7fe250 CR3: 000000045745e004 CR4: 00000000003606e0
[   19.435710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   19.435715] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   19.435720] Call Trace:
[   19.435726]  __mmu_notifier_invalidate_range_end+0x9b/0xe0
[   19.435732]  unmap_region+0xae/0x120
[   19.435738]  ? __vma_rb_erase+0x11e/0x240
[   19.435744]  do_munmap+0x262/0x400
[   19.435749]  mmap_region+0xb1/0x5d0
[   19.435755]  ? selinux_file_mprotect+0x140/0x140
[   19.435760]  do_mmap+0x489/0x660
[   19.435765]  ? vm_mmap_pgoff+0x9f/0x110
[   19.435770]  vm_mmap_pgoff+0xcf/0x110
[   19.435776]  ksys_mmap_pgoff+0x1b4/0x260
[   19.435781]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   19.435787]  do_syscall_64+0x56/0x1a0
[   19.435792]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   19.435797] RIP: 0033:0x7fae03bca6ba
[   19.435800] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 
4d 89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 
1f 00
[   19.435847] RSP: 002b:00007ffd3f8dc058 EFLAGS: 00000206 ORIG_RAX: 0000000000000009
[   19.435853] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fae03bca6ba
[   19.435859] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000001a58000
[   19.435864] RBP: 0000000000000006 R08: 0000000000000006 R09: 0000000104a2a000
[   19.435870] R10: 0000000000000011 R11: 0000000000000206 R12: 0000000001a58000
[   19.435875] R13: 0000000000001000 R14: 0000000000000011 R15: 0000000104a2a000
[   19.435883] irq event stamp: 416603
[   19.435887] hardirqs last  enabled at (416603): [<ffffffffb7002b42>] do_syscall_64+0x12/0x1a0
[   19.435894] hardirqs last disabled at (416602): [<ffffffffb7c00082>] entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[   19.435902] softirqs last  enabled at (415488): [<ffffffffb7e00393>] __do_softirq+0x393/0x4a6
[   19.435910] softirqs last disabled at (415471): [<ffffffffb7076261>] irq_exit+0xc1/0xd0
[   19.435916] ---[ end trace 3e22281c2c3bcb4c ]---

On 2018-07-11 12:11 PM, Michel DÃ¤nzer wrote:
> I've been occasionally getting the debugging warnings seen in the
> attached kernel log excerpt. Only for piglit amd_pinned_memory and for
> libdrm amdgpu_test, so I suspect it's pointing at a userptr related
> issue. Christian, any ideas?
>
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx