On Wed, Nov 29, 2017 at 4:54 PM, Michel Dänzer <michel at daenzer.net> wrote: > On 2017-11-29 03:40 PM, Oded Gabbay wrote: >> On Wed, Nov 29, 2017 at 2:31 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote: >>> On Wed, Nov 29, 2017 at 1:16 PM, Michel Dänzer <michel at daenzer.net> wrote: >>>> On 2017-11-01 09:31 AM, Oded Gabbay wrote: >>>>> ok, taken to -next. >>>> >>>> This change broke the radeon driver on my Kaveri laptop. The gdm login >>>> screen works, but logging into the GNOME on Xorg session quickly results >>>> in a GPU hang and associated badness, see the attached dmesg. >>>> >>>> Reverting this change on top of drm-next makes it work again. >>>> >>>> On a hunch, I've tried reverting commits 62a7b7fbd08e ("drm/radeon: >>>> reduce number of free VMIDs and pipes in KV") and 28b57b856b63 >>>> ("drm/radeon/cik: Don't touch int of pipes 1-7"), but no luck. >>>> >>>> Any ideas for what else is missing? >>>> >>>> Note that the amdkfd driver isn't actually active anyway, because I'm >>>> disabling the IOMMU. Is it possible that it's still doing or triggering >>>> some needed HW setup before it bails in that case? >>>> >>>> >>>> P.S. Assuming we can fix this without reverting, maybe we could also >>>> remove rdev->grbm_idx_mutex again? >>>> >>>> -- >>>> Earthling Michel Dänzer | http://www.amd.com >>>> Libre software enthusiast | Mesa and X developer >>> >>> Hi Michel, >>> Even without IOMMU, amdkfd will initialize the module and internal >>> structures per device, up to the point where it tries to register a >>> callback with the iommu driver. >>> If IOMMU is disabled, it will fail then with the following error >>> message (in dmesg): "error getting iommu info. is the iommu enabled?" >>> >>> Having said that, it doesn't initialize anything in the device H/W >>> itself, so I find this very weird. >>> >>> I looked at the patch itself again and I don't see anything suspicious. >>> >>> I'll try to resurrect my Kaveri machine to check this, but it will >>> take some time. >>> >>> Oded >> >> Any chance that the increase of VMIDs from 8 to 16 somehow (although I >> don't know how) caused this problem ? >> The desktop gui also didn't work for me, but when I changed the VMID >> number back to 8 (in cik.c) the gui worked again. >> >> Michel, could you try this as well ? > > Yeah, that also occurred to me in the meantime, and I can confirm your > findings. > > My guess right now is that it's related to cik_pcie_init_compute_vmid. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer Yeah, that seems reasonable. That initialization was part of kfd but then we moved it to radeon. Now it collides with radeon's initialization. I removed it completely and returned the number of VMIDs to 16 and the GUI is working. I'll send a patch. Oded