[+Christian]
OK, this comes from the ttm_bo_wait_ctx call in this section of
amdgpu_bo_move:
if ((old_mem->mem_type == TTM_PL_TT ||
old_mem->mem_type == AMDGPU_PL_PREEMPT) &&
new_mem->mem_type == TTM_PL_SYSTEM) {
r = ttm_bo_wait_ctx(bo, ctx);
if (r)
return r;
amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm);
ttm_resource_free(bo, &bo->resource);
ttm_bo_assign_mem(bo, new_mem);
goto out;
}
We can't just remove this wait. It's not even specific to KFD or DMABuf
imports. We also can't just change it to avoid waiting for eviction
fences because it's also used for GTT BOs (e.g. before a BO gets swapped
under extreme memory pressure). So we also need to trigger the eviction
fence in general case.
In the specific case of DMABuf imports, they share the reservation
object with the original BO. So waiting on the reservation triggers the
eviction fence on the original BO. I think we want to avoid the waiting
on eviction fences for all BOs where the underlying memory is managed by
some other BO, and at the same time also avoid ever evicting the DMABuf
import BO. That's what AMDGPU_PL_PREEMPT is for. So I think a
combination of two changes should to the trick:
1. Change kfd_mem_dmamap_dmabuf to use AMDGPU_GEM_DOMAIN_PREEMPTIBLE
2. Add a special case in the above if-block for old_mem->mem_type ==
AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with
owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction fences
Regards,
Felix
Am 2023-04-04 um 10:36 schrieb Eric Huang:
Here is the backtrace from Jira:
Thu Nov 10 13:10:23 2022] Scheduling eviction of pid 97784 in 0 jiffies
[Thu Nov 10 13:10:23 2022] WARNING: CPU: 173 PID: 97784 at
/var/lib/dkms/amdgpu/5.16.9.22.20-1438746~20.04/build/amd/amdgpu/../amdkfd/kfd_device.c:878
kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022] Modules linked in: veth amdgpu(OE)
amddrm_ttm_helper(OE) amdttm(OE) iommu_v2 amd_sched(OE) amdkcl(OE)
xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user
xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc
aufs overlay binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac
scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common amd64_edac
edac_mce_amd kvm_amd kvm efi_pstore rapl ipmi_ssif ccp acpi_ipmi
k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr
ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs
ib_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
crypto_simd cryptd ast drm_vram_helper drm_ttm_helper ttm mlx5_core
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[Thu Nov 10 13:10:23 2022] pci_hyperv_intf cec psample igb mlxfw
rc_core dca ahci xhci_pci tls drm i2c_algo_bit libahci
xhci_pci_renesas i2c_piix4
[Thu Nov 10 13:10:23 2022] CPU: 173 PID: 97784 Comm: onnxruntime_tes
Tainted: G W OE 5.13.0-30-generic #33~20.04.1-Ubuntu
[Thu Nov 10 13:10:23 2022] Hardware name: GIGABYTE
G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020
[Thu Nov 10 13:10:23 2022] RIP:
0010:kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022] Code: 5e 5d c3 4c 89 e7 e8 cb c6 44 df eb
e7 49 8b 45 60 48 89 ca 48 c7 c7 38 8b d7 c1 48 89 4d e0 8b b0 20 09
00 00 e8 87 ee 7e df <0f> 0b 48 8b 4d e0 eb 9f 41 be ea ff ff ff eb ba
41 be ed ff ff ff
[Thu Nov 10 13:10:23 2022] RSP: 0018:ffffb25f2a173978 EFLAGS: 00010086
[Thu Nov 10 13:10:23 2022] RAX: 0000000000000000 RBX: 0000000000000001
RCX: 0000000000000027
[Thu Nov 10 13:10:23 2022] RDX: 0000000000000027 RSI: 00000000fffeffff
RDI: ffff95d06e4a09c8
[Thu Nov 10 13:10:23 2022] RBP: ffffb25f2a173998 R08: ffff95d06e4a09c0
R09: ffffb25f2a173750
[Thu Nov 10 13:10:23 2022] R10: 0000000000000001 R11: 0000000000000001
R12: ffff95c371d74580
[Thu Nov 10 13:10:23 2022] R13: ffff95b1cd3f2000 R14: 0000000000000000
R15: ffff95c371d74580
[Thu Nov 10 13:10:23 2022] FS: 00007fcaff268b00(0000)
GS:ffff95d06e480000(0000) knlGS:0000000000000000
[Thu Nov 10 13:10:23 2022] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Thu Nov 10 13:10:23 2022] CR2: 00007fc643980000 CR3: 00000003e9492000
CR4: 0000000000350ee0
[Thu Nov 10 13:10:23 2022] Call Trace:
[Thu Nov 10 13:10:23 2022] <TASK>
[Thu Nov 10 13:10:23 2022] amdkfd_fence_enable_signaling+0x46/0x50
[amdgpu]
[Thu Nov 10 13:10:23 2022] __dma_fence_enable_signaling+0x52/0xb0
[Thu Nov 10 13:10:23 2022] dma_fence_default_wait+0xa9/0x200
[Thu Nov 10 13:10:23 2022] dma_fence_wait_timeout+0xbd/0xe0
[Thu Nov 10 13:10:23 2022] amddma_resv_wait_timeout+0x6f/0xd0 [amdkcl]
[Thu Nov 10 13:10:23 2022] amdttm_bo_wait+0x39/0x50 [amdttm]
[Thu Nov 10 13:10:23 2022] amdgpu_bo_move+0x41e/0x7b0 [amdgpu]
[Thu Nov 10 13:10:23 2022] ? down_write+0x13/0x50
[Thu Nov 10 13:10:23 2022] ? unmap_mapping_pages+0x68/0x130
[Thu Nov 10 13:10:23 2022] ttm_bo_handle_move_mem+0x7f/0x120 [amdttm]
[Thu Nov 10 13:10:23 2022] amdttm_bo_validate+0xbf/0x100 [amdttm]
[Thu Nov 10 13:10:23 2022] kfd_mem_dmaunmap_attachment+0x131/0x140
[amdgpu]
[Thu Nov 10 13:10:23 2022] unmap_bo_from_gpuvm+0x67/0x80 [amdgpu]
[Thu Nov 10 13:10:23 2022]
amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x114/0x220 [amdgpu]
[Thu Nov 10 13:10:23 2022] ? __mod_memcg_lruvec_state+0x22/0xe0
[Thu Nov 10 13:10:23 2022] kfd_ioctl_unmap_memory_from_gpu+0xe8/0x270
[amdgpu]
[Thu Nov 10 13:10:23 2022] kfd_ioctl+0x23c/0x590 [amdgpu]
[Thu Nov 10 13:10:23 2022] ?
kfd_ioctl_get_process_apertures_new+0x330/0x330 [amdgpu]
[Thu Nov 10 13:10:23 2022] ? exit_to_user_mode_prepare+0x3d/0x1c0
[Thu Nov 10 13:10:23 2022] ? __fget_files+0xa7/0xd0
[Thu Nov 10 13:10:23 2022] __x64_sys_ioctl+0x91/0xc0
[Thu Nov 10 13:10:23 2022] do_syscall_64+0x61/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? do_syscall_64+0x6e/0xb0
[Thu Nov 10 13:10:23 2022] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[Thu Nov 10 13:10:23 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae
[Thu Nov 10 13:10:23 2022] RIP: 0033:0x7fcaff57b3ab
[Thu Nov 10 13:10:23 2022] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7
00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa
b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00
f7 d8 64 89 01 48
[Thu Nov 10 13:10:23 2022] RSP: 002b:00007fffe41e0098 EFLAGS: 00000206
ORIG_RAX: 0000000000000010
[Thu Nov 10 13:10:23 2022] RAX: ffffffffffffffda RBX: 00007fcacc7f7f80
RCX: 00007fcaff57b3ab
[Thu Nov 10 13:10:23 2022] RDX: 00007fffe41e0120 RSI: 00000000c0184b19
RDI: 0000000000000003
[Thu Nov 10 13:10:23 2022] RBP: 00007fffe41e00d0 R08: 0000562e2d5730d0
R09: 0000000000000000
[Thu Nov 10 13:10:23 2022] R10: 0000562e2c928ec0 R11: 0000000000000206
R12: 0000000000000001
[Thu Nov 10 13:10:23 2022] R13: 00007fffe41e04b0 R14: 0000000000000000
R15: 0000562e2d3f5b20
[Thu Nov 10 13:10:23 2022] </TASK>
[Thu Nov 10 13:10:23 2022] ---[ end trace 1464f08f6be60b30 ]---
Regards,
Eric
On 2023-04-04 10:11, Felix Kuehling wrote:
If we keep the BO in the GTT domain, it means it will not be updated
if we validate it again later in kfd_mem_dmamap_dmabuf. This means
we'll use stale DMA addresses when we update the page tables after
evictions.
I think we'll need to find a different way to avoid triggering the
eviction fence on the original BO when changing the placement of the
DMABuf import here. If you need help brainstorming here, please share
a backtrace from the eviction generated with the debug_evictions
module param.
Regards,
Felix
Am 2023-04-03 um 13:59 schrieb Eric Huang:
dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf
changing placement to CPU will trigger memory eviction after calling
ttm_bo_validate, and the eviction will cause performance drop.
Keeping the correct domain will solve the issue.
Signed-off-by: Eric Huang <jinhuieric.huang@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a3b09edfd1bf..17b708acb447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct
kfd_mem_attachment *attachment)
struct ttm_operation_ctx ctx = {.interruptible = true};
struct amdgpu_bo *bo = attachment->bo_va->base.bo;
- amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+ amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
}