[PATCH] Increase AMDGPU_MAX_UVD_INSTANCES to 3

leo.liu@xxxxxxx (Leo Liu) · Mon, 25 Jun 2018 15:13:58 -0400

The problem is reproducible with enable UBSAN.

 Â ================================================================================
[Â Â Â  3.866643] UBSAN: Undefined behaviour in 
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:379:29
[Â Â Â  3.866656] index 2 is out of range for type 'amdgpu_uvd_inst [2]'
[Â Â Â  3.866667] CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 4.16.0-rc7+ #3
[Â Â Â  3.866677] Hardware name: Gigabyte Technology Co., Ltd. 
GA-990FXA-UD7/GA-990FXA-UD7, BIOS F9 06/08/2012
[Â Â Â  3.866693] Workqueue: events work_for_cpu_fn
[Â Â Â  3.866702] Call Trace:
[Â Â Â  3.866710]Â  dump_stack+0x85/0xc5
[Â Â Â  3.866719]Â  ubsan_epilogue+0x9/0x40
[Â Â Â  3.866727]Â  __ubsan_handle_out_of_bounds+0x89/0x90
[Â Â Â  3.866737]Â  ? rcu_read_lock_sched_held+0x58/0x60
[Â Â Â  3.866746]Â  ? __kmalloc+0x26c/0x2d0
[Â Â Â  3.866846]Â  amdgpu_fence_driver_start_ring+0x259/0x280 [amdgpu]
[Â Â Â  3.866896]Â  amdgpu_ring_init+0x12c/0x710 [amdgpu]
[Â Â Â  3.866906]Â  ? sprintf+0x42/0x50
[Â Â Â  3.866956]Â  amdgpu_gfx_kiq_init_ring+0x1bc/0x3a0 [amdgpu]
[Â Â Â  3.867009]Â  gfx_v8_0_sw_init+0x1ad3/0x2360 [amdgpu]
[Â Â Â  3.867062]Â  ? smu7_init+0xec/0x160 [amdgpu]
[Â Â Â  3.867109]Â  amdgpu_device_init+0x112c/0x1dc0 [amdgpu]
[Â Â Â  3.867120]Â  ? rcu_read_lock_sched_held+0x58/0x60
[Â Â Â  3.867166]Â  amdgpu_driver_load_kms+0x74/0x2e0 [amdgpu]
[Â Â Â  3.867178]Â  drm_dev_register+0x134/0x1c0
[Â Â Â  3.867223]Â  amdgpu_pci_probe+0x163/0x270 [amdgpu]
[Â Â Â  3.867233]Â  local_pci_probe+0x42/0xa0
[Â Â Â  3.867242]Â  work_for_cpu_fn+0x16/0x20
[Â Â Â  3.867250]Â  process_one_work+0x269/0x640
[Â Â Â  3.867260]Â  worker_thread+0x216/0x3d0
[Â Â Â  3.867268]Â  ? process_one_work+0x640/0x640
[Â Â Â  3.867276]Â  kthread+0x113/0x130
[Â Â Â  3.867282]Â  ? kthread_create_worker_on_cpu+0x50/0x50
[Â Â Â  3.867293]Â  ret_from_fork+0x27/0x50
[Â Â Â  3.867304] 
================================================================================
[Â Â Â  3.869808] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[Â Â Â  3.871505] [drm] Found VCE firmware Version: 53.26 Binary ID: 3

The fix will follow.

Regards,
Leo

On 06/25/2018 03:02 PM, Alex Deucher wrote:
> On Mon, Jun 25, 2018 at 2:59 PM, James Zhu <jamesz at amd.com> wrote:
>>
>> On 2018-06-25 02:53 PM, Alex Deucher wrote:
>>
>> On Mon, Jun 25, 2018 at 2:37 PM, James Zhu <jamesz at amd.com> wrote:
>>
>> For one UVD instance case,:
>>
>>
>> In function amdgpu_driver_load_kms, all ring->me should be set to zero.
>>      adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>>
>>
>> For two UVD instances cases:
>>
>> static void uvd_v7_0_set_ring_funcs(struct amdgpu_device *adev)
>> ..
>>      for (i = 0; i < adev->uvd.num_uvd_inst; i++) {
>>          adev->uvd.inst[i].ring.me = i;
>>
>> static void uvd_v7_0_set_enc_ring_funcs(struct amdgpu_device *adev)
>>
>>      for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
>>              adev->uvd.inst[j].ring_enc[i].me = j;
>>
>> uvd_v4_2_early_init in uvd_v4_2.c  adev->uvd.num_uvd_inst = 1;
>> uvd_v5_0_early_init in uvd_v5_0.c  adev->uvd.num_uvd_inst = 1;
>> uvd_v6_0_early_init in uvd_v6_0.c  adev->uvd.num_uvd_inst = 1;
>> uvd_v7_0_early_init in uvd_v7_0.c
>>      if (adev->asic_type == CHIP_VEGA20)
>>          adev->uvd.num_uvd_inst = UVD7_MAX_HW_INSTANCES_VEGA20;/*2*/
>>      else
>>          adev->uvd.num_uvd_inst = 1;
>>
>>
>> I didn't know when ring->me is set to 2. Maybe there is some leakage
>> somewhere.
>>
>> What about older uvd (4.2, 5.0, 6.0) blocks?
>>
>> I think the below code will reset
>> adev->uvd.inst[AMDGPU_MAX_UVD_INSTANCES].ring->me and
>> adev->uvd.inst[AMDGPU_MAX_UVD_INSTANCES].ring_enc[AMDGPU_MAX_UVD_ENC_RINGS]->me
>> to zero.
>> for older uvd IP UVD block.
>>
>> adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>>
>> Do I understand correctly?
> Yes, it should.  That's why it doesn't make sense that it would be
> getting another value.
>
> Alex
>
>> James
>>
>> Alex
>>
>> Best regards!
>>
>> James zhu
>>
>>
>> On 2018-06-25 01:29 PM, Deucher, Alexander wrote:
>>
>> Odd. The structure should be 0 initialized.  Does this patch help?
>>
>>
>> Alex
>>
>> ________________________________
>> From: Timothy Pearson <tpearson at raptorengineering.com>
>> Sent: Monday, June 25, 2018 11:53:12 AM
>> To: Zhu, James
>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander; Zhou,
>> David(ChunMing); Koenig, Christian
>> Subject: Re: [PATCH] Increase AMDGPU_MAX_UVD_INSTANCES to 3
>>
>> n 06/25/2018 09:46 AM, James Zhu wrote:
>>
>> On 2018-06-23 08:02 PM, Timothy Pearson wrote:
>>
>> amdgpu_fence_driver_start_ring() attempts to access
>> UVD instance 2 during setup, while the existing UVD
>> instance count only allows instances 0 and 1.
>>
>> Increase AMDGPU_MAX_UVD_INSTANCES by one to avoid the
>> invalid array access.
>>
>> Caught by UBSAN.
>>
>> Hi Timothy,
>>
>>  From design of view, it is not right to just change
>> AMDGPU_MAX_UVD_INSTANCES to 3.
>>
>> Could you tell me some detail of UBSAN test and attach the dmesg also?
>>
>> Definitely, was looking for some feedback from anyone knowing more about
>> the internals of the UVD system.
>>
>> What's happening is that "ring->me" in amdgpu_fence_driver_start_ring()
>> (drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:379) is set to a value of
>> "2".  The overall dmesg is otherwise uninteresting, but I can try to
>> grab the UBSAN output if needed.
>>
>> --
>> Timothy Pearson
>> Raptor Engineering
>> +1 (415) 727-8645 (direct line)
>> +1 (512) 690-0200 (switchboard)
>> https://www.raptorengineering.com
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx