On 09/10/2018 05:33 PM, Christian König wrote: > Am 10.09.2018 um 04:44 schrieb Zhang, Jerry (Junwei): >> On 09/10/2018 02:04 AM, Christian König wrote: >>> Make a VM mapping which is as unaligned as possible. >> >> Is it going to test unaligned address between BO allocation and BO mapping >> and skip huge page mapping? > > Yes and no. > > Huge page handling works by mapping at least 2MB of continuous memory on a 2MB aligned address. > > What I do here is I allocate 4GB of VRAM and try to map it to an address which is aligned to 1GB + 4KB. > > In other words the VM subsystem will add a single PTE to align the entry to 8KB, then it add two PTEs to align it to 16KB, then four to get to 32KB and so on until we have the maximum alignment of 2GB > which Vega/Raven support in the L1. Thanks to explain that. From the trace log, it will map 1*4KB, 2*4KB, ..., 256*4KB, then back to 1*4KB. amdgpu_test-1384 [005] .... 110.634466: amdgpu_vm_bo_update: soffs=0000100001, eoffs=00001fffff, flags=70 amdgpu_test-1384 [005] .... 110.634467: amdgpu_vm_set_ptes: pe=f5feffd008, addr=01fec00000, incr=4096, flags=71, count=1 amdgpu_test-1384 [005] .... 110.634468: amdgpu_vm_set_ptes: pe=f5feffd010, addr=01fec01000, incr=4096, flags=f1, count=2 amdgpu_test-1384 [005] .... 110.634468: amdgpu_vm_set_ptes: pe=f5feffd020, addr=01fec03000, incr=4096, flags=171, count=4 amdgpu_test-1384 [005] .... 110.634468: amdgpu_vm_set_ptes: pe=f5feffd040, addr=01fec07000, incr=4096, flags=1f1, count=8 amdgpu_test-1384 [005] .... 110.634468: amdgpu_vm_set_ptes: pe=f5feffd080, addr=01fec0f000, incr=4096, flags=271, count=16 amdgpu_test-1384 [005] .... 110.634468: amdgpu_vm_set_ptes: pe=f5feffd100, addr=01fec1f000, incr=4096, flags=2f1, count=32 amdgpu_test-1384 [005] .... 110.634469: amdgpu_vm_set_ptes: pe=f5feffd200, addr=01fec3f000, incr=4096, flags=371, count=64 amdgpu_test-1384 [005] .... 110.634469: amdgpu_vm_set_ptes: pe=f5feffd400, addr=01fec7f000, incr=4096, flags=3f1, count=128 amdgpu_test-1384 [005] .... 110.634469: amdgpu_vm_set_ptes: pe=f5feffd800, addr=01fecff000, incr=4096, flags=471, count=256 amdgpu_test-1384 [005] .... 110.634469: amdgpu_vm_set_ptes: pe=f5feffc000, addr=01fedff000, incr=4096, flags=71, count=1 amdgpu_test-1384 [005] .... 110.634470: amdgpu_vm_set_ptes: pe=f5feffc008, addr=01fea00000, incr=4096, flags=71, count=1 amdgpu_test-1384 [005] .... 110.634470: amdgpu_vm_set_ptes: pe=f5feffc010, addr=01fea01000, incr=4096, flags=f1, count=2 And it sounds like a performance test for Vega and later. If so, shall we add some time stamp in the log? Regards, Jerry > > Regards, > Christian. > >> >>> >>> Signed-off-by: Christian König <christian.koenig at amd.com> >>> --- >>> tests/amdgpu/vm_tests.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 44 insertions(+), 1 deletion(-) >>> >>> diff --git a/tests/amdgpu/vm_tests.c b/tests/amdgpu/vm_tests.c >>> index 7b6dc5d6..fada2987 100644 >>> --- a/tests/amdgpu/vm_tests.c >>> +++ b/tests/amdgpu/vm_tests.c >>> @@ -31,8 +31,8 @@ static amdgpu_device_handle device_handle; >>> static uint32_t major_version; >>> static uint32_t minor_version; >>> >>> - >>> static void amdgpu_vmid_reserve_test(void); >>> +static void amdgpu_vm_unaligned_map(void); >>> >>> CU_BOOL suite_vm_tests_enable(void) >>> { >>> @@ -84,6 +84,7 @@ int suite_vm_tests_clean(void) >>> >>> CU_TestInfo vm_tests[] = { >>> { "resere vmid test", amdgpu_vmid_reserve_test }, >>> + { "unaligned map", amdgpu_vm_unaligned_map }, >>> CU_TEST_INFO_NULL, >>> }; >>> >>> @@ -167,3 +168,45 @@ static void amdgpu_vmid_reserve_test(void) >>> r = amdgpu_cs_ctx_free(context_handle); >>> CU_ASSERT_EQUAL(r, 0); >>> } >>> + >>> +static void amdgpu_vm_unaligned_map(void) >>> +{ >>> + const uint64_t map_size = (4ULL << 30) - (2 << 12); >>> + struct amdgpu_bo_alloc_request request = {}; >>> + amdgpu_bo_handle buf_handle; >>> + amdgpu_va_handle handle; >>> + uint64_t vmc_addr; >>> + int r; >>> + >>> + request.alloc_size = 4ULL << 30; >>> + request.phys_alignment = 4096; >>> + request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; >>> + request.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS; >>> + >>> + r = amdgpu_bo_alloc(device_handle, &request, &buf_handle); >>> + /* Don't let the test fail if the device doesn't have enough VRAM */ >> >> We may print some info to the console here. >> >> Regards, >> Jerry >> >>> + if (r) >>> + return; >>> + >>> + r = amdgpu_va_range_alloc(device_handle, amdgpu_gpu_va_range_general, >>> + 4ULL << 30, 1ULL << 30, 0, &vmc_addr, >>> + &handle, 0); >>> + CU_ASSERT_EQUAL(r, 0); >>> + if (r) >>> + goto error_va_alloc; >>> + >>> + vmc_addr += 1 << 12; >>> + >>> + r = amdgpu_bo_va_op(buf_handle, 0, map_size, vmc_addr, 0, >>> + AMDGPU_VA_OP_MAP); >>> + CU_ASSERT_EQUAL(r, 0); >>> + if (r) >>> + goto error_va_alloc; >>> + >>> + amdgpu_bo_va_op(buf_handle, 0, map_size, vmc_addr, 0, >>> + AMDGPU_VA_OP_UNMAP); >>> + >>> +error_va_alloc: >>> + amdgpu_bo_free(buf_handle); >>> + >>> +} >>> >