> > amdgpu_ring_alloc() itself is unconditionally setting count_dw, > > which looked > > suspicious to me -- so I added the check shown below, and it does > > look like > > ring_alloc() gets called again too soon. Am I right in thinking > > this could be > > the cause of amdgpu_ring_test_helper() failing in timeout ? > > > > Not likely. The PSP failing to load firmware is most likely the > problem. You need to have a functional PSP for any of the other > engines to be usable. If we can't load the firmware for the > microcontrollers, the driver can't interact with them. Even if it has no effect on my primary issue, I'm still having doubt on this: if we call amdgpu_ring_alloc() twice without ensuring the allocated space has been padded with nop's (ie. 0xFFFFFFFF, right ?) what happens when the GFX IP (or should we rather say "GC"?) will parse those ? My reading of gfx_enable_kcq() is that it is in this case. Isn't it missing a call to ring_commit() before ring_test() ? > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > > @@ -70,6 +70,9 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, > > unsigned ndw) > > if (WARN_ON_ONCE(ndw > ring->max_dw)) > > return -ENOMEM; > > > > + /* check we're not allocating too fast */ > > + WARN_ON_ONCE(ring->count_dw); > > + > > ring->count_dw = ndw; > > ring->wptr_old = ring->wptr; > > > > > > About gfx_v9_0_sw_fini(): > > - the 2 calls to bo_free are called here without condition, whereas > > they are > > allocated from rlc_init, not directly from sw_init. Is this > > asymmetry wanted ? > > > > > > Maybe such info should join the documentation at some point? > > Yeah, would be useful. > > Alex > > > > > [0] > > https://lists.freedesktop.org/archives/amd-gfx/2021-November/071855.html >