The problem is reproducible with enable UBSAN. Â ================================================================================ [Â Â Â 3.866643] UBSAN: Undefined behaviour in drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:379:29 [Â Â Â 3.866656] index 2 is out of range for type 'amdgpu_uvd_inst [2]' [Â Â Â 3.866667] CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 4.16.0-rc7+ #3 [Â Â Â 3.866677] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD7/GA-990FXA-UD7, BIOS F9 06/08/2012 [Â Â Â 3.866693] Workqueue: events work_for_cpu_fn [Â Â Â 3.866702] Call Trace: [Â Â Â 3.866710]Â dump_stack+0x85/0xc5 [Â Â Â 3.866719]Â ubsan_epilogue+0x9/0x40 [Â Â Â 3.866727]Â __ubsan_handle_out_of_bounds+0x89/0x90 [Â Â Â 3.866737]Â ? rcu_read_lock_sched_held+0x58/0x60 [Â Â Â 3.866746]Â ? __kmalloc+0x26c/0x2d0 [Â Â Â 3.866846]Â amdgpu_fence_driver_start_ring+0x259/0x280 [amdgpu] [Â Â Â 3.866896]Â amdgpu_ring_init+0x12c/0x710 [amdgpu] [Â Â Â 3.866906]Â ? sprintf+0x42/0x50 [Â Â Â 3.866956]Â amdgpu_gfx_kiq_init_ring+0x1bc/0x3a0 [amdgpu] [Â Â Â 3.867009]Â gfx_v8_0_sw_init+0x1ad3/0x2360 [amdgpu] [Â Â Â 3.867062]Â ? smu7_init+0xec/0x160 [amdgpu] [Â Â Â 3.867109]Â amdgpu_device_init+0x112c/0x1dc0 [amdgpu] [Â Â Â 3.867120]Â ? rcu_read_lock_sched_held+0x58/0x60 [Â Â Â 3.867166]Â amdgpu_driver_load_kms+0x74/0x2e0 [amdgpu] [Â Â Â 3.867178]Â drm_dev_register+0x134/0x1c0 [Â Â Â 3.867223]Â amdgpu_pci_probe+0x163/0x270 [amdgpu] [Â Â Â 3.867233]Â local_pci_probe+0x42/0xa0 [Â Â Â 3.867242]Â work_for_cpu_fn+0x16/0x20 [Â Â Â 3.867250]Â process_one_work+0x269/0x640 [Â Â Â 3.867260]Â worker_thread+0x216/0x3d0 [Â Â Â 3.867268]Â ? process_one_work+0x640/0x640 [Â Â Â 3.867276]Â kthread+0x113/0x130 [Â Â Â 3.867282]Â ? kthread_create_worker_on_cpu+0x50/0x50 [Â Â Â 3.867293]Â ret_from_fork+0x27/0x50 [Â Â Â 3.867304] ================================================================================ [Â Â Â 3.869808] [drm] Found UVD firmware Version: 1.130 Family ID: 16 [Â Â Â 3.871505] [drm] Found VCE firmware Version: 53.26 Binary ID: 3 The fix will follow. Regards, Leo On 06/25/2018 03:02 PM, Alex Deucher wrote: > On Mon, Jun 25, 2018 at 2:59 PM, James Zhu <jamesz at amd.com> wrote: >> >> On 2018-06-25 02:53 PM, Alex Deucher wrote: >> >> On Mon, Jun 25, 2018 at 2:37 PM, James Zhu <jamesz at amd.com> wrote: >> >> For one UVD instance case,: >> >> >> In function amdgpu_driver_load_kms, all ring->me should be set to zero. >> adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL); >> >> >> For two UVD instances cases: >> >> static void uvd_v7_0_set_ring_funcs(struct amdgpu_device *adev) >> .. >> for (i = 0; i < adev->uvd.num_uvd_inst; i++) { >> adev->uvd.inst[i].ring.me = i; >> >> static void uvd_v7_0_set_enc_ring_funcs(struct amdgpu_device *adev) >> >> for (j = 0; j < adev->uvd.num_uvd_inst; j++) { >> adev->uvd.inst[j].ring_enc[i].me = j; >> >> uvd_v4_2_early_init in uvd_v4_2.c adev->uvd.num_uvd_inst = 1; >> uvd_v5_0_early_init in uvd_v5_0.c adev->uvd.num_uvd_inst = 1; >> uvd_v6_0_early_init in uvd_v6_0.c adev->uvd.num_uvd_inst = 1; >> uvd_v7_0_early_init in uvd_v7_0.c >> if (adev->asic_type == CHIP_VEGA20) >> adev->uvd.num_uvd_inst = UVD7_MAX_HW_INSTANCES_VEGA20;/*2*/ >> else >> adev->uvd.num_uvd_inst = 1; >> >> >> I didn't know when ring->me is set to 2. Maybe there is some leakage >> somewhere. >> >> What about older uvd (4.2, 5.0, 6.0) blocks? >> >> I think the below code will reset >> adev->uvd.inst[AMDGPU_MAX_UVD_INSTANCES].ring->me and >> adev->uvd.inst[AMDGPU_MAX_UVD_INSTANCES].ring_enc[AMDGPU_MAX_UVD_ENC_RINGS]->me >> to zero. >> for older uvd IP UVD block. >> >> adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL); >> >> Do I understand correctly? > Yes, it should. That's why it doesn't make sense that it would be > getting another value. > > Alex > >> James >> >> Alex >> >> Best regards! >> >> James zhu >> >> >> On 2018-06-25 01:29 PM, Deucher, Alexander wrote: >> >> Odd. The structure should be 0 initialized. Does this patch help? >> >> >> Alex >> >> ________________________________ >> From: Timothy Pearson <tpearson at raptorengineering.com> >> Sent: Monday, June 25, 2018 11:53:12 AM >> To: Zhu, James >> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander; Zhou, >> David(ChunMing); Koenig, Christian >> Subject: Re: [PATCH] Increase AMDGPU_MAX_UVD_INSTANCES to 3 >> >> n 06/25/2018 09:46 AM, James Zhu wrote: >> >> On 2018-06-23 08:02 PM, Timothy Pearson wrote: >> >> amdgpu_fence_driver_start_ring() attempts to access >> UVD instance 2 during setup, while the existing UVD >> instance count only allows instances 0 and 1. >> >> Increase AMDGPU_MAX_UVD_INSTANCES by one to avoid the >> invalid array access. >> >> Caught by UBSAN. >> >> Hi Timothy, >> >> From design of view, it is not right to just change >> AMDGPU_MAX_UVD_INSTANCES to 3. >> >> Could you tell me some detail of UBSAN test and attach the dmesg also? >> >> Definitely, was looking for some feedback from anyone knowing more about >> the internals of the UVD system. >> >> What's happening is that "ring->me" in amdgpu_fence_driver_start_ring() >> (drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:379) is set to a value of >> "2". The overall dmesg is otherwise uninteresting, but I can try to >> grab the UBSAN output if needed. >> >> -- >> Timothy Pearson >> Raptor Engineering >> +1 (415) 727-8645 (direct line) >> +1 (512) 690-0200 (switchboard) >> https://www.raptorengineering.com >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx