Am 08.10.24 um 17:05 schrieb Tvrtko Ursulin:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> I've noticed the hardware ring padding optimisations have landed so I decided to respin the CPU side optimisations. First two patches are simply adding ring fill helpers which deal with reducing the CPU cost of emitting hundreds of nops from the for-amdgpu_ring_write loops. If receptive for the idea, please double-check I preserved endianess behaviour as is.
I'm pretty sure that this was broken before or at least uses HW features which are not guaranteed to work any more.
Sunil has already commited a set which does mostly the same as this here. The only thing missing is the improvements for the IB patching and a bunch of things I've been working on recently.
Going to send those out in a Minute, would be cool if you could run a few performance analysis on those patches as well since you already seem to have the setup for that.
Thanks, Christian.
Last two patches are new and RFC. Both are incomplete conversion to two new helpers intended to deal with an often repeated pattern of: - amdgpu_ring_write(ring, lower_32_bits(addr)); - amdgpu_ring_write(ring, upper_32_bits(addr)); + amdgpu_ring_write_addr(ring, addr); Last patch is the most uncertain one where there seems to be some magic bit used only on big endian. It has no name so I couldn't figure out what it was about. - amdgpu_ring_write(ring, -#ifdef __BIG_ENDIAN - (2 << 0) | -#endif - lower_32_bits(ib->gpu_addr)); - amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr)); + amdgpu_ring_write_addr_xbe(ring, ib->gpu_addr); Anyway, both patterns have a lot of users so reductions in source code and binary size aside, main question is do these kind of helpers improve readability or are making it worse. (Note that the _xbe name in the last patch is just a placeholder.) Cc: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Cc: Sunil Khatri <sunil.khatri@xxxxxxx> Tvrtko Ursulin (4): drm/amdgpu: More efficient ring padding drm/amdgpu: More more efficient ring padding drm/amdgpu: Add and use amdgpu_ring_write_addr() helper drm/amdgpu: Document the magic big endian bit drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 19 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 101 +++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 25 +++--- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 66 +++++---------- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 60 +++++--------- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 45 ++++------ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 63 +++++--------- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 48 ++++------- drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 8 +- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 9 +- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 7 +- 28 files changed, 319 insertions(+), 345 deletions(-)