Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/3/2024 09:19, Alex Deucher wrote:
> + Jay, Felix
> 
> On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma <ent3rm4n@xxxxxxxxx> wrote:
>>
>> That commit causes NULL pointer dereferences in dmesgs when
>> running applications using ROCm, including clinfo, blender,
>> and PyTorch, since v6.6.1. Revert it to fix blender again.
>>
>> This reverts commit 96c211f1f9ef82183493f4ceed4e347b52849149.
>>
>> Closes: https://github.com/ROCm/ROCm/issues/2596
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2991
>> Signed-off-by: Kaibo Ma <ent3rm4n@xxxxxxxxx>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 26 ++++++++++----------
>>  1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>> index 62b205dac..6604a3f99 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>> @@ -330,12 +330,6 @@ static void kfd_init_apertures_vi(struct kfd_process_device *pdd, uint8_t id)
>>         pdd->gpuvm_limit =
>>                 pdd->dev->kfd->shared_resources.gpuvm_size - 1;
>>
>> -       /* dGPUs: the reserved space for kernel
>> -        * before SVM
>> -        */
>> -       pdd->qpd.cwsr_base = SVM_CWSR_BASE;
>> -       pdd->qpd.ib_base = SVM_IB_BASE;
>> -
>>         pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
>>         pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
>>  }
>> @@ -345,18 +339,18 @@ static void kfd_init_apertures_v9(struct kfd_process_device *pdd, uint8_t id)
>>         pdd->lds_base = MAKE_LDS_APP_BASE_V9();
>>         pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
>>
>> -       pdd->gpuvm_base = PAGE_SIZE;
>> +        /* Raven needs SVM to support graphic handle, etc. Leave the small
>> +         * reserved space before SVM on Raven as well, even though we don't
>> +         * have to.
>> +         * Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that they
>> +         * are used in Thunk to reserve SVM.
>> +         */
>> +        pdd->gpuvm_base = SVM_USER_BASE;
>>         pdd->gpuvm_limit =
>>                 pdd->dev->kfd->shared_resources.gpuvm_size - 1;
>>
>>         pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9();
>>         pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
>> -
>> -       /*
>> -        * Place TBA/TMA on opposite side of VM hole to prevent
>> -        * stray faults from triggering SVM on these pages.
>> -        */
>> -       pdd->qpd.cwsr_base = pdd->dev->kfd->shared_resources.gpuvm_size;
>>  }
>>
>>  int kfd_init_apertures(struct kfd_process *process)
>> @@ -413,6 +407,12 @@ int kfd_init_apertures(struct kfd_process *process)
>>                                         return -EINVAL;
>>                                 }
>>                         }
>> +
>> +                        /* dGPUs: the reserved space for kernel
>> +                         * before SVM
>> +                         */
>> +                        pdd->qpd.cwsr_base = SVM_CWSR_BASE;
>> +                        pdd->qpd.ib_base = SVM_IB_BASE;
>>                 }
>>
>>                 dev_dbg(kfd_device, "node id %u\n", id);
>> --
>> 2.42.0
>>

I saw a segfault issue in Mesa yesterday. Not sure about the others, but I don't know how to make this change while compatibility with older UMDs.

So I agree, let's revert it.

Reviewed-by: Jay Cornwall <jay.cornwall@xxxxxxx>



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux