RE: [PATCH] drm/amdgpu: Fix missing drain retry fault the last entry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - AMD Internal Distribution Only]

>-----Original Message-----
>From: Yang, Philip <Philip.Yang@xxxxxxx>
>Sent: Tuesday, March 4, 2025 11:00 PM
>To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>Subject: Re: [PATCH] drm/amdgpu: Fix missing drain retry fault the last entry
>
>
>On 2025-03-03 19:44, Deng, Emily wrote:
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>> Ping......
>>
>> Emily Deng
>> Best Wishes
>>
>>
>>
>>> -----Original Message-----
>>> From: Emily Deng <Emily.Deng@xxxxxxx>
>>> Sent: Monday, March 3, 2025 5:35 PM
>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>>> Cc: Deng, Emily <Emily.Deng@xxxxxxx>
>>> Subject: [PATCH] drm/amdgpu: Fix missing drain retry fault the last
>>> entry
>>>
>>> For equal case, it also need to be dropped.
>>>
>>> Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> index 7d4395a5d8ac..73b8bcb54734 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> @@ -76,7 +76,7 @@ struct amdgpu_ih_ring {
>>>
>>> /* return true if time stamp t2 is after t1 with 48bit wrap around */
>>> #define amdgpu_ih_ts_after(t1, t2) \
>>> -              (((int64_t)((t2) << 16) - (int64_t)((t1) << 16)) > 0LL)
>>> +              (((int64_t)((t2) << 16) - (int64_t)((t1) << 16)) >=
>>> + 0LL)
>
>The comment is correct and current condition is correct too,
>svm_range_drain_retry_fault drop the retry fault with same timestamp as the IH
>checkpoint_wptr timestamp. Do you see GPU page fault with the stale retry fault after
>process exit, or what issue do you want to fix?
>
>Regards,
>
>Philip
This commit aims to address the page fault issue observed when running kfdtest with the following command:

sudo bash -c "export GTEST_REPEAT=1000; export GTEST_BREAK_ON_FAILURE=1; ./kfdtest --gtest_filter=KFDSVMRangeTest.*:-KFDSVMRangeTest.ReadOnlyRangeTest* --timeout 100000" 2>/dev/null

The issue is severe because the page fault triggers a call to kfd_set_dbg_ev_from_interrupt, which subsequently invokes kfd_dqm_evict_pasid. As a result, the process is evicted, leading to test failures. This commit resolves the root cause of the page fault to prevent such evictions.

Emily Deng
Best Wishes


>
>
>>>
>>> /* provided by the ih block */
>>> struct amdgpu_ih_funcs {
>>> --
>>> 2.34.1




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux