On 11/7/23 15:47, Alex Deucher wrote: > On Tue, Nov 7, 2023 at 9:19 AM Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >> On Tue, Nov 7, 2023 at 5:52 AM Christian König >> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: >>> Am 03.11.23 um 23:10 schrieb Alex Deucher: >>>> On Fri, Nov 3, 2023 at 4:17 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >>>>> On Thu, Oct 26, 2023 at 4:17 PM Luben Tuikov <ltuikov89@xxxxxxxxx> wrote: >>>>>> Pushed to drm-misc-next. >>>>> BTW, I'm seeing the following on older GPUs with VCE and UVD even with >>>>> this patch: >>>>> [ 11.886024] amdgpu 0000:0a:00.0: [drm] *ERROR* drm_sched_job_init: >>>>> entity has no rq! >>>>> [ 11.886028] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests >>>>> [amdgpu]] *ERROR* IB test failed on uvd (-2). >>>>> [ 11.889927] amdgpu 0000:0a:00.0: [drm] *ERROR* drm_sched_job_init: >>>>> entity has no rq! >>>>> [ 11.889930] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests >>>>> [amdgpu]] *ERROR* IB test failed on vce0 (-2). >>>>> [ 11.890172] [drm:process_one_work] *ERROR* ib ring test failed (-2). >>>>> Seems to be specific to UVD and VCE, I don't see anything similar with >>>>> VCN, but the flows for both are pretty similar. Not sure why we are >>>>> not seeing it for VCN. Just a heads up if you have any ideas. Will >>>>> take a closer look next week. >>>> + Leo >>>> >>>> I found the problem. We set up scheduling entities for UVD and VCE >>>> specifically and not for any other engines. I don't remember why >>>> offhand. I'm guessing maybe to deal with the session limits on UVD >>>> and VCE? If so I'm not sure of a clean way to fix this. >>> >>> I haven't looked through all my mails yet so could be that Leo has >>> already answered this. >>> >>> The UVD/VCE entities are used for the older chips where applications >>> have to use create/destroy messages to the firmware. >>> >>> If an application exits without cleaning up their handles the kernel >>> sends the appropriate destroy messages itself. For an example see >>> amdgpu_uvd_free_handles(). >>> >>> We used to initialize those entities with separate calls after the >>> scheduler had been brought up, see amdgpu_uvd_entity_init() for an example. >>> >>> But this was somehow messed up and we now do the call to >>> amdgpu_uvd_entity_init() at the end of *_sw_init() instead of _late_init(). >>> >>> I suggest to just come up with a function which can be used for the >>> late_init() callback of the UVD/VCE blocks. >> >> I guess the issue is that we only need to initialize the entity once >> so sw_init makes sense. All of the other functions get called at >> resume time, etc. I think we could probably put it into >> amdgpu_device_init_schedulers() somehow. > > I think something like this might do the trick. This does indeed fix the IB test failures for me with Bonaire. There are still [drm] Fence fallback timer expired on ring sdma0 messages, that might be a separate regression though. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer