[AMD Official Use Only - AMD Internal Distribution Only] > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > Sent: Friday, May 10, 2024 5:31 PM > To: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu: Use the slab allocator to reduce job > allocation fragmentation > > Am 10.05.24 um 10:11 schrieb Prike Liang: > > Using kzalloc() results in about 50% memory fragmentation, therefore > > use the slab allocator to reproduce memory fragmentation. > > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 26 > ++++++++++++++++++++----- > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 1 + > > 3 files changed, 23 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > index ea14f1c8f430..3de1b42291b6 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > @@ -3040,6 +3040,7 @@ static void __exit amdgpu_exit(void) > > amdgpu_fence_slab_fini(); > > mmu_notifier_synchronize(); > > amdgpu_xcp_drv_release(); > > + amdgpue_job_slab_fini(); > > } > > > > module_init(amdgpu_init); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > index e4742b65032d..8327bf017a0e 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > @@ -31,6 +31,8 @@ > > #include "amdgpu_trace.h" > > #include "amdgpu_reset.h" > > > > +static struct kmem_cache *amdgpu_job_slab; > > + > > static enum drm_gpu_sched_stat amdgpu_job_timedout(struct > drm_sched_job *s_job) > > { > > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); @@ - > 101,10 > > +103,19 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct > amdgpu_vm *vm, > > if (num_ibs == 0) > > return -EINVAL; > > > > - *job = kzalloc(struct_size(*job, ibs, num_ibs), GFP_KERNEL); > > - if (!*job) > > + amdgpu_job_slab = kmem_cache_create("amdgpu_job", > > + struct_size(*job, ibs, num_ibs), 0, > > + SLAB_HWCACHE_ALIGN, NULL); > > Well you are declaring a global slab cache for a dynamic job size, then try to > set it up in the job allocation function which can be called concurrently with > different number of IBs. > > To sum it up this is completely racy and will go boom immediately in testing. > As far as I can see this suggestion is just utterly nonsense. > > Regards, > Christian. > Hi, Christian The num_ibs is calculated as 1 in amdgpu_cs_p1_ib() and from amdgpu_cs_pass1(), the num_ibs will be set to 1 as an input parameter at amdgpu_job_alloc(). Moreover, the num_ibs is only set from amdgpu_cs_p1_ib() and shouldn't have a chance to be overwritten from the user space driver side. Also, I checked a few GL and Vulkan applications and didn't find multiple IBs within one amdgpu job submission. If there are still concerns about the IB array size on the amdgpu_job object allocated, we can remove the IBs member and decompose the IB with the job object. Then, we can export and access the IBs as a parameter from a new interface like amdgpu_cs_patch_ibs(struct amdgpu_cs_parser *p, struct amdgpu_job *job, struct amdgpu_ib *ib). Regarding this patch, using kmem_cache_zalloc() instead of kzalloc() can save about 448 bytes of memory space for each amdgpu_job object allocated. Meanwhile, the job object allocation takes almost the same time, so it should have no side effect on the performance. If the idea is sensible, I will rework the patch by creating the job slab during the driver probe period. Thanks, Prike > > + if (!amdgpu_job_slab) { > > + DRM_ERROR("create amdgpu_job cache failed\n"); > > return -ENOMEM; > > + } > > > > + *job = kmem_cache_zalloc(amdgpu_job_slab, GFP_KERNEL); > > + if (!*job) { > > + kmem_cache_destroy(amdgpu_job_slab); > > + return -ENOMEM; > > + } > > /* > > * Initialize the scheduler to at least some ring so that we always > > * have a pointer to adev. > > @@ -138,7 +149,7 @@ int amdgpu_job_alloc_with_ib(struct > amdgpu_device *adev, > > if (r) { > > if (entity) > > drm_sched_job_cleanup(&(*job)->base); > > - kfree(*job); > > + kmem_cache_free(amdgpu_job_slab, job); > > } > > > > return r; > > @@ -179,6 +190,11 @@ void amdgpu_job_free_resources(struct > amdgpu_job *job) > > amdgpu_ib_free(ring->adev, &job->ibs[i], f); > > } > > > > +void amdgpue_job_slab_fini(void) > > +{ > > + kmem_cache_destroy(amdgpu_job_slab); > > +} > > + > > static void amdgpu_job_free_cb(struct drm_sched_job *s_job) > > { > > struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -189,7 +205,7 > @@ > > static void amdgpu_job_free_cb(struct drm_sched_job *s_job) > > > > /* only put the hw fence if has embedded fence */ > > if (!job->hw_fence.ops) > > - kfree(job); > > + kmem_cache_free(amdgpu_job_slab, job); > > else > > dma_fence_put(&job->hw_fence); > > } > > @@ -221,7 +237,7 @@ void amdgpu_job_free(struct amdgpu_job *job) > > dma_fence_put(job->gang_submit); > > > > if (!job->hw_fence.ops) > > - kfree(job); > > + kmem_cache_free(amdgpu_job_slab, job); > > else > > dma_fence_put(&job->hw_fence); > > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > > index a963a25ddd62..4491d5f9c74d 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > > @@ -103,5 +103,6 @@ int amdgpu_job_submit_direct(struct amdgpu_job > *job, struct amdgpu_ring *ring, > > struct dma_fence **fence); > > > > void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler > > *sched); > > +void amdgpue_job_slab_fini(void); > > > > #endif