On 24/02/2025 12:06, Tvrtko Ursulin wrote:
A lot of the workloads create jobs with just one IB and if we re-order some
struct members we can stop that allocation spilling into the 1k SLAB bucket.
Before:
sizeof(struct amdgpu_job) + sizeof(struct amdgpu_ib) = 480 + 40 = 520
After:
sizeof(struct amdgpu_job) + sizeof(struct amdgpu_ib) = 472 + 32 = 504
It is not a huge gain in the big picture but every little helps.
FWIW it is also quite* possible to make two IB jobs fit into 512 by
converting booleans to flags and shrinking some fields:
/* size: 448, cachelines: 7, members: 24 */
/* forced alignments: 1 */
So 448 + 2 * 64 = 512 !
That avoids spilling _any_ submissions, for example from Cyberpunk 2077,
into the 1k SLAB bucket.
*) I said quite because as after I converted booleans to flags, which
required u16 for 9 flags, shrunk vmid and num_ibs to u8 and
job_run_counter to u16 (all of which seems completely fine), I needed
just a tiny bit extra. So I shrank gws_size to u16. Being a size in
pages that could also easily be large enough.
Regards,
Tvrtko
Tvrtko Ursulin (2):
drm/amdgpu: Remove hole from struct amdgpu_ib
drm/amdgpu: Reduce holes in struct amdgpu_job
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 19 ++++++++-----------
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 4 ++--
2 files changed, 10 insertions(+), 13 deletions(-)