Re: [PATCH 2/2] drm/amdkfd: change svm range evict

Felix Kuehling <felix.kuehling@xxxxxxx> · Thu, 30 Jun 2022 12:09:00 -0400

On 2022-06-30 11:19, Eric Huang wrote:

On 2022-06-29 19:29, Felix Kuehling wrote:
On 2022-06-29 18:53, Eric Huang wrote:


On 2022-06-29 18:20, Felix Kuehling wrote:
On 2022-06-28 17:43, Eric Huang wrote:
Two changes:
1. reducing unnecessary evict/unmap when range is not mapped to gpu.
2. adding always evict when flags is set to always_mapped.

Signed-off-by: Eric Huang <jinhuieric.huang@xxxxxxx>
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4bf2f75f853b..76e817687ef9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
struct mm_struct *mm,
      struct kfd_process *p;
      int r = 0;
  +    if (!prange->mapped_to_gpu)
+        return 0;

This feels like an unrelated optimization that should be in a 
separate patch.

But I'm not sure this is correct, because it doesn't consider child 
ranges. svm_range_unmap_from_gpus already contains this check, so 
ranges should not be unmapped unnecessarily either way. Is there 
any other benefit to this change that I'm missing?
I will send another patch separately that considers child ranges.

I think this should only be done in the XNACK-off case. For XNACK-on 
it's already handled in the svm_range_unmap_from_gpus.
Yes and It is also done when KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED is set.


The benefit is it will reduce unnecessary queue evicts when 
allocating context save memory, which is unmapped to gpu.

That sounds wrong. The context save area should never be unmapped 
from GPU. That's the whole point of setting the 
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening 
while migrating the context save area to VRAM for the first time, 
even before it's mapped to GPU?
Yes. It is for the first time when registering svm range and migrating 
to VRAM are doing together, at this moment, the range is not mapped to 
GPU.

I think there may be another eviction, when the CWSR header is 
initialized by the CPU. That would also migrate it back to system 
memory. To avoid that, you should probably register the context save 
area only after the header has been initialized.
Yes. I am using this way. Please look at patch 4/4.

I think avoiding an eviction when memory is migrated when it is first 
registered is worthwhile, not just for CWSR.


It is for efficiency reason. On the other hand, without this 
optimization KFDCWSRTest.InterruptRestore fails with queue 
preemption error.

What do you mean by "queue preemption error"? Does HWS hang?
HWS doesn't hang immediately, so there is not error for fence timeout 
"The cp might be in an unrecoverable state due to an unsuccessful 
queues preemption". The error is "HIQ MQD's queue_doorbell_id0 is not 
0, Queue preemption time out" after checking mqd manager, which means 
HWS abandons unmap queue request without returning timeout error to 
driver. And after this error, the following test will fail at queue 
creation as HWS hangs

OK, that sounds like the kind of bug the InterruptRestore test is meant 
to catch. I think you just created a better test by causing more 
preemptions. ;)

So we should do two things:

 * Avoid unnecessary preemptions in KFD
 * Improve the test to reproduce this hang even without unnecessary
   preemptions in KFD, so we can investigate the issue

Regards,
  Felix




I think the reason is the extra queue evicts make HWS too busy to 
preempt existing queues. There is one unmap_queue packet sent to HWS 
in current code, and will be three unmap_queue packets with unified 
memory allocation.

When queues of a process are already evicted, they should not get 
evicted again. That's handled by the qpd->evicted counter. There 
should never be multiple unmap_queues packets in flight at the same 
time. If you're seeing three unmap_queues, you should also see queues 
restored three times.

HWS should never be too busy to evict queues. If you're seeing 
preemptions fail, you may have found a bug.
The restore delay worker will do something differently in term of 
timing. It could restore queues before next unmap_queues, so the 
situation is too complicate to debug in multiple queues evict/restore 
environment. The error definitely means there is a bug, from driver 
point of view there is nothing wrong even adding extra queue eviction, 
so I try to avoid extra queue eviction and keep it as before. The 
bottom line is to make sure unified svm range for context save area 
doesn't cause any failure in kfdtest, so I can theoretically assume 
extra queue eviction/restoring causes HWS failure.

Regards,
Eric

Regards,
  Felix


So this optimization will keep only one unmap_queue as before.

Regards,
Eric

Regards,
  Felix


+
      p = container_of(svms, struct kfd_process, svms);
        pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
0x%lx]\n",
           svms, prange->start, prange->last, start, last);
  -    if (!p->xnack_enabled) {
+    if (!p->xnack_enabled ||
+        (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
          int evicted_ranges;
            list_for_each_entry(pchild, &prange->child_list, 
child_list) {
@@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
struct mm_struct *mm,
          if (r)
              goto out_unlock_range;
  -        if (migrated && !p->xnack_enabled) {
+        if (migrated && (!p->xnack_enabled ||
+            (prange->flags & 
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
+            prange->mapped_to_gpu) {
              pr_debug("restore_work will update mappings of 
GPUs\n");
              mutex_unlock(&prange->migrate_mutex);
              continue;