Hey Oded, Sorry to be a nuisance, but if you have everything still setup could you give this fix a quick go? diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 5321d18..9f70ee0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -667,7 +667,7 @@ static int set_sched_resources(struct device_queue_manager *dqm) /* This situation may be hit in the future if a new HW * generation exposes more than 64 queues. If so, the * definition of res.queue_mask needs updating */ - if (WARN_ON(i > sizeof(res.queue_mask))) { + if (WARN_ON(i > (sizeof(res.queue_mask)*8))) { pr_err("Invalid queue enabled by amdgpu: %d\n", i); break; } John/Felix, Any chance I could borrow a carrizo/kaveri for a few days? Or maybe you could help me run some final tests on this patch series? - Andres On 2017-02-09 03:11 PM, Oded Gabbay wrote: > Andres, > > I tried your patches on Kaveri with airlied's drm-next branch. > I used radeon+amdkfd > > The following test failed: KFDQMTest.CreateMultipleCpQueues > However, I can't debug it because I don't have the sources of kfdtest. > > In dmesg, I saw the following warning during boot: > WARNING: CPU: 0 PID: 150 at > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670 > start_cpsch+0xc5/0x220 [amdkfd] > [ 4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj > hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+) > i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt > libahci fb_sys_fops drm r8169 mii fjes video > [ 4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1 > [ 4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be > filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 > [ 4.393812] Call Trace: > [ 4.393818] dump_stack+0x63/0x90 > [ 4.393822] __warn+0xcb/0xf0 > [ 4.393823] warn_slowpath_null+0x1d/0x20 > [ 4.393830] start_cpsch+0xc5/0x220 [amdkfd] > [ 4.393836] ? initialize_cpsch+0xa0/0xb0 [amdkfd] > [ 4.393841] kgd2kfd_device_init+0x375/0x490 [amdkfd] > [ 4.393883] radeon_kfd_device_init+0xaf/0xd0 [radeon] > [ 4.393911] radeon_driver_load_kms+0x11e/0x1f0 [radeon] > [ 4.393933] drm_dev_register+0x14a/0x200 [drm] > [ 4.393946] drm_get_pci_dev+0x9d/0x160 [drm] > [ 4.393974] radeon_pci_probe+0xb8/0xe0 [radeon] > [ 4.393976] local_pci_probe+0x45/0xa0 > [ 4.393978] pci_device_probe+0x103/0x150 > [ 4.393981] driver_probe_device+0x2bf/0x460 > [ 4.393982] __driver_attach+0xdf/0xf0 > [ 4.393984] ? driver_probe_device+0x460/0x460 > [ 4.393985] bus_for_each_dev+0x6c/0xc0 > [ 4.393987] driver_attach+0x1e/0x20 > [ 4.393988] bus_add_driver+0x1fd/0x270 > [ 4.393989] ? 0xffffffffc05c8000 > [ 4.393991] driver_register+0x60/0xe0 > [ 4.393992] ? 0xffffffffc05c8000 > [ 4.393993] __pci_register_driver+0x4c/0x50 > [ 4.394007] drm_pci_init+0xeb/0x100 [drm] > [ 4.394008] ? 0xffffffffc05c8000 > [ 4.394031] radeon_init+0x98/0xb6 [radeon] > [ 4.394034] do_one_initcall+0x53/0x1a0 > [ 4.394037] ? __vunmap+0x81/0xd0 > [ 4.394039] ? kmem_cache_alloc_trace+0x152/0x1c0 > [ 4.394041] ? vfree+0x2e/0x70 > [ 4.394044] do_init_module+0x5f/0x1ff > [ 4.394046] load_module+0x24cc/0x29f0 > [ 4.394047] ? __symbol_put+0x60/0x60 > [ 4.394050] ? security_kernel_post_read_file+0x6b/0x80 > [ 4.394052] SYSC_finit_module+0xdf/0x110 > [ 4.394054] SyS_finit_module+0xe/0x10 > [ 4.394056] entry_SYSCALL_64_fastpath+0x1e/0xad > [ 4.394058] RIP: 0033:0x7f9cda77c8e9 > [ 4.394059] RSP: 002b:00007ffe195d3378 EFLAGS: 00000246 ORIG_RAX: > 0000000000000139 > [ 4.394060] RAX: ffffffffffffffda RBX: 00007f9cdb8dda7e RCX: 00007f9cda77c8e9 > [ 4.394061] RDX: 0000000000000000 RSI: 00007f9cdac7ce2a RDI: 0000000000000013 > [ 4.394062] RBP: 00007ffe195d2450 R08: 0000000000000000 R09: 0000000000000000 > [ 4.394063] R10: 0000000000000013 R11: 0000000000000246 R12: 00007ffe195d245a > [ 4.394063] R13: 00007ffe195d1378 R14: 0000563f70cc93b0 R15: 0000563f70cba4d0 > [ 4.394091] ---[ end trace 9c5af17304d998bb ]--- > [ 4.394092] Invalid queue enabled by amdgpu: 9 > > I suggest you get a Kaveri/Carrizo machine to debug these issues. > > Until that, I don't think we should merge this patch-set. > > Oded > > On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez <andresx7 at gmail.com> wrote: >> Thank you Oded. >> >> - Andres >> >> >> On 2017-02-08 02:32 PM, Oded Gabbay wrote: >>> On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez <andresx7 at gmail.com> >>> wrote: >>>> Hey Felix, >>>> >>>> Thanks for the pointer to the ROCm mqd commit. I like that the >>>> workarounds >>>> are easy to spot. I'll add that to a new patch series I'm working on for >>>> some bug-fixes for perf being lower on pipes other than pipe 0. >>>> >>>> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the >>>> HW >>>> will be able to give it a go. I put in a few small hacks to get KFD to >>>> boot >>>> but do nothing on polaris10. >>>> >>>> Regards, >>>> Andres >>>> >>>> >>>> On 2017-02-06 03:20 PM, Felix Kuehling wrote: >>>>> Hi Andres, >>>>> >>>>> Thank you for tackling this task. It's more involved than I expected, >>>>> mostly because I didn't have much awareness of the MQD management in >>>>> amdgpu. >>>>> >>>>> I made one comment in a separate message about the unified MQD commit >>>>> function, if you want to bring that more in line with our latest ROCm >>>>> release on github. >>>>> >>>>> Also, were you able to test the upstream KFD with your changes on a >>>>> Kaveri or Carrizo? >>>>> >>>>> Regards, >>>>> Felix >>>>> >>>>> >>>>> On 17-02-03 11:51 PM, Andres Rodriguez wrote: >>>>>> The current queue/pipe split policy is for amdgpu to take the first >>>>>> pipe >>>>>> of >>>>>> MEC0 and leave the rest for amdkfd to use. This policy is taken as an >>>>>> assumption in a few areas of the implementation. >>>>>> >>>>>> This patch series aims to allow for flexible/tunable queue/pipe split >>>>>> policies >>>>>> between kgd and kfd. It also updates the queue/pipe split policy to one >>>>>> that >>>>>> allows better compute app concurrency for both drivers. >>>>>> >>>>>> In the process some duplicate code and hardcoded constants were >>>>>> removed. >>>>>> >>>>>> Any suggestions or feedback on improvements welcome. >>>>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> Hi Andres, >>> I will try to find sometime to test it on my Kaveri machine. >>> >>> Oded >>