Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

Felix Kuehling <felix.kuehling@xxxxxxx> · Wed, 13 Dec 2023 11:23:30 -0500

On 2023-12-13 10:24, James Zhu wrote:
Ping ...

On 2023-12-08 18:01, James Zhu wrote:
When application tries to allocate all system memory and cause memory
to swap out. Needs more time for hmm_range_fault to validate the
remaining page for allocation. To be safe, increase timeout value to
1 second for 64MB range.

Signed-off-by: James Zhu <James.Zhu@xxxxxxx>

This is not the first time we're incrementing this timeout. Eventually 
we should get rid of that and find a way to make this work reliably 
without a timeout. There can always be situations where faults take 
longer, and we should not fail randomly in those cases.

There are also some FIXMEs in this code that should be addressed at the 
same time.

That said, as a short-term fix, this patch is

Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 081267161d40..b24eb5821fd1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,
          pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
              hmm_range->start, hmm_range->end);
  -        /* Assuming 128MB takes maximum 1 second to fault page 
address */
-        timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL);
+        /* Assuming 64MB takes maximum 1 second to fault page 
address */
+        timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
          timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
          timeout = jiffies + msecs_to_jiffies(timeout);