Re: 答复: [PATCH] drm/amdgpu: fix fence slab teardown

notasas@xxxxxxxxx (Grazvydas Ignotas) · Mon, 24 Oct 2016 12:32:27 +0300

On Mon, Oct 24, 2016 at 6:35 AM, Qu, Jim <Jim.Qu at amd.com> wrote:
> I did observed the issue when replace kernel module use DKMS, and it maybe get error at reboot, got calltrace:
>
> [ 3529.525360] =============================================================================
> [ 3529.525361] BUG amd_sched_fence (Tainted: G    B      OE  ------------  ): Objects remaining in amd_sched_fence on kmem_cache_close()
> [ 3529.525361] -----------------------------------------------------------------------------
> [ 3529.525361]
> [ 3529.525361] INFO: Slab 0xffffea000094b200 objects=25 used=2 fp=0xffff8800252c9180 flags=0x1fffff00004080
> [ 3529.525362] CPU: 0 PID: 18523 Comm: reboot Tainted: G    B      OE  ------------   3.10.0-512.el7.x86_64 #1
> [ 3529.525362] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1802 01/28/2014
> [ 3529.525363]  ffffea000094b200 00000000b3b19dcf ffff880160827b50 ffffffff81685e8c
> [ 3529.525363]  ffff880160827c28 ffffffff811d9e34 ffff880000000020 ffff880160827c38
> [ 3529.525364]  ffff880160827be8 656a624f818de5f0 616d657220737463 6e6920676e696e69
> [ 3529.525364] Call Trace:
> [ 3529.525365]  [<ffffffff81685e8c>] dump_stack+0x19/0x1b
> [ 3529.525366]  [<ffffffff811d9e34>] slab_err+0xb4/0xe0
> [ 3529.525367]  [<ffffffff81088c29>] ? vprintk_default+0x29/0x40
> [ 3529.525368]  [<ffffffff8167f434>] ? printk+0x5e/0x75
> [ 3529.525369]  [<ffffffff811dd133>] ? __kmalloc+0x1f3/0x240
> [ 3529.525370]  [<ffffffff811df80b>] ? kmem_cache_close+0x12b/0x2f0
> [ 3529.525370]  [<ffffffff811df82c>] kmem_cache_close+0x14c/0x2f0
> [ 3529.525371]  [<ffffffff811df9e4>] __kmem_cache_shutdown+0x14/0x80
> [ 3529.525372]  [<ffffffff811a5704>] kmem_cache_destroy+0x44/0xf0
> [ 3529.525387]  [<ffffffffa02bfb0c>] amd_sched_fini+0x3c/0x40 [amdgpu]
> [ 3529.525395]  [<ffffffffa0231bfa>] amdgpu_fence_driver_fini+0x7a/0x110 [amdgpu]
> [ 3529.525403]  [<ffffffffa02230dd>] amdgpu_device_fini+0x3d/0x1f0 [amdgpu]
> [ 3529.525411]  [<ffffffffa0225673>] amdgpu_driver_unload_kms+0x43/0x80 [amdgpu]
> [ 3529.525416]  [<ffffffffa005fb89>] drm_dev_unregister+0x29/0xb0 [drm]
> [ 3529.525422]  [<ffffffffa0060273>] drm_put_dev+0x23/0x70 [drm]
> [ 3529.525429]  [<ffffffffa021f3fd>] amdgpu_pci_shutdown+0x1d/0x20 [amdgpu]
> [ 3529.525430]  [<ffffffff81359b56>] pci_device_shutdown+0x36/0x70
> [ 3529.525431]  [<ffffffff8142a388>] device_shutdown+0xc8/0x180
> [ 3529.525432]  [<ffffffff810a1536>] kernel_restart_prepare+0x36/0x40
> [ 3529.525433]  [<ffffffff810a1552>] kernel_restart+0x12/0x60
> [ 3529.525433]  [<ffffffff810a17c9>] SYSC_reboot+0x229/0x260
> [ 3529.525435]  [<ffffffff81691971>] ? __do_page_fault+0x171/0x450
> [ 3529.525436]  [<ffffffff810a186e>] SyS_reboot+0xe/0x10
> [ 3529.525437]  [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
> [ 3529.525438] INFO: Object 0xffff8800252c8a00 @offset=2560
> [ 3529.525438] INFO: Object 0xffff8800252c9540 @offset=5440
>
>
> Do these series patches fix this issue?

Yes, but only partially - there are still some leaked objects left.
When SLUB_DEBUG is set, you can also set CONFIG_SLUB_DEBUG_ON or add
"slub_debug" to kernel command line to see the leak backtraces.

GraÅ¾vydas