On 2023-12-13 10:24, James Zhu wrote:
Ping ...
On 2023-12-08 18:01, James Zhu wrote:
When application tries to allocate all system memory and cause memory
to swap out. Needs more time for hmm_range_fault to validate the
remaining page for allocation. To be safe, increase timeout value to
1 second for 64MB range.
Signed-off-by: James Zhu <James.Zhu@xxxxxxx>
This is not the first time we're incrementing this timeout. Eventually
we should get rid of that and find a way to make this work reliably
without a timeout. There can always be situations where faults take
longer, and we should not fail randomly in those cases.
There are also some FIXMEs in this code that should be addressed at the
same time.
That said, as a short-term fix, this patch is
Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 081267161d40..b24eb5821fd1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct
mmu_interval_notifier *notifier,
pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
hmm_range->start, hmm_range->end);
- /* Assuming 128MB takes maximum 1 second to fault page
address */
- timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL);
+ /* Assuming 64MB takes maximum 1 second to fault page
address */
+ timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
timeout = jiffies + msecs_to_jiffies(timeout);