Hi Eric, Ah! Yeah that is a well known issue. Basic problem is that for releasing the BOs we need to reserve them to check if they are idle or not. I've got a branch with a TTM change to avoid that, but essentially that is a huge problem which needs a rather big change in memory management to fix. Regards, Christian. Am 05.11.19 um 17:27 schrieb Huang, JinHuiEric: > Hi Christian, > > I found the reason why page tables are not freed when unmapping. All the > pts are reserved, then they are not freed until vm fini. So the > consequences are old pts and new pts for the same VAs will exist till vm > fini. In KFD big buffer strees test, multiple times of mapping and > unmapping a big range of system memory causes huge vram pts usage > accumulation. > > I tried to avoid generating duplicated pts during unmapping in > amdgpu_vm_update_ptes() by skipping amdgpu_vm_free_pts() and not > reserving the lowest pts, but they didn't work with VM fault. The only > way working is skipping whole function amdgpu_vm_update_ptes(), but it > seems wrong, because we have to update GPU VM MMU. > > So there is no bug in amdgpu_vm_update_ptes(), but the accumulation of > pts vram usage is an overhead. Do you think what we can do to get better > solution? > > Regards, > > Eric > > On 2019-10-31 10:33 a.m., Huang, JinHuiEric wrote: >> The hardware is vega10 and test is KFDMemoryTest.BigBufferStressTest. >> More detail is on Jira SWDEV-201443. >> >> Regards, >> >> Eric >> >> On 2019-10-31 10:08 a.m., StDenis, Tom wrote: >>> I could try it on my carrizo/polaris setup. Is there a test procedure I >>> could folllow to trigger the changed code paths? >>> >>> >>> Tom >>> >>> On 2019-10-31 6:41 a.m., Koenig, Christian wrote: >>>> Just tested this and amdgpu_vm_update_ptes() indeed works as expected. >>>> >>>> When you free at least a 2MB the lowest level of page tables is freed >>>> up again. >>>> >>>> BTW: What hardware have you tested this on? On gfx8 and older it is >>>> expected that page tables are never freed. >>>> >>>> Regards, >>>> Christian. >>>> >>>> Am 30.10.19 um 19:11 schrieb Christian König: >>>>> Then I don't see how this patch actually changes anything. >>>>> >>>>> Could only be a bug in amdgpu_vm_update_ptes(). Going to investigate >>>>> this, but I won't have time to look into the ticket in detail. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 30.10.19 um 19:00 schrieb Huang, JinHuiEric: >>>>>> Actually I do prevent to remove in-use pts by this: >>>>>> >>>>>> + r = amdgpu_vm_remove_ptes(adev, vm, >>>>>> + (mapping->start + 0x1ff) & (~0x1ffll), >>>>>> + (mapping->last + 1) & (~0x1ffll)); >>>>>> >>>>>> Which is only removing aligned page table for 2M. And I have tested >>>>>> it at least on KFD tests without anything broken. >>>>>> >>>>>> By the way, I am not familiar with memory staff. This patch is the >>>>>> best I can do for now. Could you take a look at the Jira ticket >>>>>> SWDEV-201443 ? and find the better solution. Thanks! >>>>>> >>>>>> Regards, >>>>>> >>>>>> Eric >>>>>> >>>>>> On 2019-10-30 1:57 p.m., Christian König wrote: >>>>>>> One thing I've forgotten: >>>>>>> >>>>>>> What you could maybe do to improve the situation is to join >>>>>>> adjacent ranges in amdgpu_vm_clear_freed(), but I'm not sure how >>>>>>> the chances are that the ranges are freed all together. >>>>>>> >>>>>>> The only other alternative I can see would be to check the mappings >>>>>>> of a range in amdgpu_update_ptes() and see if you could walk the >>>>>>> tree up if the valid flag is not set and there are no mappings left >>>>>>> for a page table. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>> Am 30.10.19 um 18:42 schrieb Koenig, Christian: >>>>>>>>> The vaild flag doesn't take effect in this function. >>>>>>>> That's irrelevant. >>>>>>>> >>>>>>>> See what amdgpu_vm_update_ptes() does is to first determine the >>>>>>>> fragment size: >>>>>>>>> amdgpu_vm_fragment(params, frag_start, end, flags, &frag, &frag_end); >>>>>>>> Then we walk down the tree: >>>>>>>>> amdgpu_vm_pt_start(adev, params->vm, start, &cursor); >>>>>>>>> while (cursor.pfn < end) { >>>>>>>> And make sure that the page tables covering the address range are >>>>>>>> actually allocated: >>>>>>>>> r = amdgpu_vm_alloc_pts(params->adev, params->vm, >>>>>>>>> &cursor); >>>>>>>> Then we update the tables with the flags and addresses and free up >>>>>>>> subsequent tables in the case of huge pages or freed up areas: >>>>>>>>> /* Free all child entries */ >>>>>>>>> while (cursor.pfn < frag_start) { >>>>>>>>> amdgpu_vm_free_pts(adev, params->vm, &cursor); >>>>>>>>> amdgpu_vm_pt_next(adev, &cursor); >>>>>>>>> } >>>>>>>> This is the maximum you can free, cause all other page tables are >>>>>>>> not completely covered by the range and so potentially still in use. >>>>>>>> >>>>>>>> And I have the strong suspicion that this is what your patch is >>>>>>>> actually doing wrong. In other words you are also freeing page >>>>>>>> tables which are only partially covered by the range and so >>>>>>>> potentially still in use. >>>>>>>> >>>>>>>> Since we don't have any tracking how many entries in a page table >>>>>>>> are currently valid and how many are invalid we actually can't >>>>>>>> implement what you are trying to do here. So the patch is >>>>>>>> definitely somehow broken. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Christian. >>>>>>>> >>>>>>>> Am 30.10.19 um 17:55 schrieb Huang, JinHuiEric: >>>>>>>>> The vaild flag doesn't take effect in this function. >>>>>>>>> amdgpu_vm_alloc_pts() is always executed that only depended on >>>>>>>>> "cursor.pfn < end". The valid flag has only been checked on here >>>>>>>>> for asic below GMC v9: >>>>>>>>> >>>>>>>>> if (adev->asic_type < CHIP_VEGA10 && >>>>>>>>> (flags & AMDGPU_PTE_VALID))... >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Eric >>>>>>>>> >>>>>>>>> On 2019-10-30 12:30 p.m., Koenig, Christian wrote: >>>>>>>>>> Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric" >>>>>>>>>> <JinHuiEric.Huang@xxxxxxx>: >>>>>>>>>> >>>>>>>>>> I tested it that it saves a lot of vram on KFD big buffer >>>>>>>>>> stress test. I think there are two reasons: >>>>>>>>>> >>>>>>>>>> 1. Calling amdgpu_vm_update_ptes() during unmapping will >>>>>>>>>> allocate unnecessary pts, because there is no flag to >>>>>>>>>> determine if the VA is mapping or unmapping in function >>>>>>>>>> amdgpu_vm_update_ptes(). It saves the most of memory. >>>>>>>>>> >>>>>>>>>> That's not correct. The valid flag is used for this. >>>>>>>>>> >>>>>>>>>> 2. Intentionally removing those unmapping pts is logical >>>>>>>>>> expectation, although it is not removing so much pts. >>>>>>>>>> >>>>>>>>>> Well I actually don't see a change to what update_ptes is doing >>>>>>>>>> and have the strong suspicion that the patch is simply broken. >>>>>>>>>> >>>>>>>>>> You either free page tables which are potentially still in use >>>>>>>>>> or update_pte doesn't free page tables when the valid but is not >>>>>>>>>> set. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Eric >>>>>>>>>> >>>>>>>>>> On 2019-10-30 11:57 a.m., Koenig, Christian wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 30.10.2019 16:47 schrieb "Kuehling, Felix" >>>>>>>>>> <Felix.Kuehling@xxxxxxx> <mailto:Felix.Kuehling@xxxxxxx>: >>>>>>>>>> >>>>>>>>>> On 2019-10-30 9:52 a.m., Christian König wrote: >>>>>>>>>> > Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric: >>>>>>>>>> >> The issue is PT BOs are not freed when unmapping VA, >>>>>>>>>> >> which causes vram usage accumulated is huge in some >>>>>>>>>> >> memory stress test, such as kfd big buffer stress >>>>>>>>>> test. >>>>>>>>>> >> Function amdgpu_vm_bo_update_mapping() is called >>>>>>>>>> by both >>>>>>>>>> >> amdgpu_vm_bo_update() and >>>>>>>>>> amdgpu_vm_clear_freed(). The >>>>>>>>>> >> solution is replacing >>>>>>>>>> amdgpu_vm_bo_update_mapping() in >>>>>>>>>> >> amdgpu_vm_clear_freed() with removing PT BOs function >>>>>>>>>> >> to save vram usage. >>>>>>>>>> > >>>>>>>>>> > NAK, that is intentional behavior. >>>>>>>>>> > >>>>>>>>>> > Otherwise we can run into out of memory situations >>>>>>>>>> when page tables >>>>>>>>>> > need to be allocated again under stress. >>>>>>>>>> >>>>>>>>>> That's a bit arbitrary and inconsistent. We are >>>>>>>>>> freeing page tables in >>>>>>>>>> other situations, when a mapping uses huge pages in >>>>>>>>>> amdgpu_vm_update_ptes. Why not when a mapping is >>>>>>>>>> destroyed completely? >>>>>>>>>> >>>>>>>>>> I'm actually a bit surprised that the huge-page >>>>>>>>>> handling in >>>>>>>>>> amdgpu_vm_update_ptes isn't kicking in to free up >>>>>>>>>> lower-level page >>>>>>>>>> tables when a BO is unmapped. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Well it does free the lower level, and that is already >>>>>>>>>> causing problems (that's why I added the reserved space). >>>>>>>>>> >>>>>>>>>> What we don't do is freeing the higher levels. >>>>>>>>>> >>>>>>>>>> E.g. when you free a 2MB BO we free the lowest level, if >>>>>>>>>> we free a 1GB BO we free the two lowest levels etc... >>>>>>>>>> >>>>>>>>>> The problem with freeing the higher levels is that you >>>>>>>>>> don't know who is also using this. E.g. we would need to >>>>>>>>>> check all entries when we unmap one. >>>>>>>>>> >>>>>>>>>> It's simply not worth it for a maximum saving of 2MB per VM. >>>>>>>>>> >>>>>>>>>> Writing this I'm actually wondering how you ended up in >>>>>>>>>> this issue? There shouldn't be much savings from this. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Felix >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > Regards, >>>>>>>>>> > Christian. >>>>>>>>>> > >>>>>>>>>> >> >>>>>>>>>> >> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924 >>>>>>>>>> >> Signed-off-by: Eric Huang >>>>>>>>>> <JinhuiEric.Huang@xxxxxxx> >>>>>>>>>> <mailto:JinhuiEric.Huang@xxxxxxx> >>>>>>>>>> >> --- >>>>>>>>>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56 >>>>>>>>>> >> +++++++++++++++++++++++++++++----- >>>>>>>>>> >> 1 file changed, 48 insertions(+), 8 deletions(-) >>>>>>>>>> >> >>>>>>>>>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> >> index 0f4c3b2..8a480c7 100644 >>>>>>>>>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> >> @@ -1930,6 +1930,51 @@ static void >>>>>>>>>> amdgpu_vm_prt_fini(struct >>>>>>>>>> >> amdgpu_device *adev, struct amdgpu_vm *vm) >>>>>>>>>> >> } >>>>>>>>>> >> /** >>>>>>>>>> >> + * amdgpu_vm_remove_ptes - free PT BOs >>>>>>>>>> >> + * >>>>>>>>>> >> + * @adev: amdgpu device structure >>>>>>>>>> >> + * @vm: amdgpu vm structure >>>>>>>>>> >> + * @start: start of mapped range >>>>>>>>>> >> + * @end: end of mapped entry >>>>>>>>>> >> + * >>>>>>>>>> >> + * Free the page table level. >>>>>>>>>> >> + */ >>>>>>>>>> >> +static int amdgpu_vm_remove_ptes(struct >>>>>>>>>> amdgpu_device *adev, >>>>>>>>>> >> + struct amdgpu_vm *vm, uint64_t start, uint64_t end) >>>>>>>>>> >> +{ >>>>>>>>>> >> + struct amdgpu_vm_pt_cursor cursor; >>>>>>>>>> >> + unsigned shift, num_entries; >>>>>>>>>> >> + >>>>>>>>>> >> + amdgpu_vm_pt_start(adev, vm, start, &cursor); >>>>>>>>>> >> + while (cursor.level < AMDGPU_VM_PTB) { >>>>>>>>>> >> + if (!amdgpu_vm_pt_descendant(adev, &cursor)) >>>>>>>>>> >> + return -ENOENT; >>>>>>>>>> >> + } >>>>>>>>>> >> + >>>>>>>>>> >> + while (cursor.pfn < end) { >>>>>>>>>> >> + amdgpu_vm_free_table(cursor.entry); >>>>>>>>>> >> + num_entries = amdgpu_vm_num_entries(adev, >>>>>>>>>> cursor.level - 1); >>>>>>>>>> >> + >>>>>>>>>> >> + if (cursor.entry != >>>>>>>>>> &cursor.parent->entries[num_entries - 1]) { >>>>>>>>>> >> + /* Next ptb entry */ >>>>>>>>>> >> + shift = amdgpu_vm_level_shift(adev, >>>>>>>>>> cursor.level - 1); >>>>>>>>>> >> + cursor.pfn += 1ULL << shift; >>>>>>>>>> >> + cursor.pfn &= ~((1ULL << shift) - 1); >>>>>>>>>> >> + cursor.entry++; >>>>>>>>>> >> + } else { >>>>>>>>>> >> + /* Next ptb entry in next pd0 entry */ >>>>>>>>>> >> + amdgpu_vm_pt_ancestor(&cursor); >>>>>>>>>> >> + shift = amdgpu_vm_level_shift(adev, >>>>>>>>>> cursor.level - 1); >>>>>>>>>> >> + cursor.pfn += 1ULL << shift; >>>>>>>>>> >> + cursor.pfn &= ~((1ULL << shift) - 1); >>>>>>>>>> >> + amdgpu_vm_pt_descendant(adev, &cursor); >>>>>>>>>> >> + } >>>>>>>>>> >> + } >>>>>>>>>> >> + >>>>>>>>>> >> + return 0; >>>>>>>>>> >> +} >>>>>>>>>> >> + >>>>>>>>>> >> +/** >>>>>>>>>> >> * amdgpu_vm_clear_freed - clear freed BOs in >>>>>>>>>> the PT >>>>>>>>>> >> * >>>>>>>>>> >> * @adev: amdgpu_device pointer >>>>>>>>>> >> @@ -1949,7 +1994,6 @@ int >>>>>>>>>> amdgpu_vm_clear_freed(struct amdgpu_device >>>>>>>>>> >> *adev, >>>>>>>>>> >> struct dma_fence **fence) >>>>>>>>>> >> { >>>>>>>>>> >> struct amdgpu_bo_va_mapping *mapping; >>>>>>>>>> >> - uint64_t init_pte_value = 0; >>>>>>>>>> >> struct dma_fence *f = NULL; >>>>>>>>>> >> int r; >>>>>>>>>> >> @@ -1958,13 +2002,10 @@ int >>>>>>>>>> amdgpu_vm_clear_freed(struct >>>>>>>>>> >> amdgpu_device *adev, >>>>>>>>>> >> struct amdgpu_bo_va_mapping, list); >>>>>>>>>> >> list_del(&mapping->list); >>>>>>>>>> >> - if (vm->pte_support_ats && >>>>>>>>>> >> - mapping->start < AMDGPU_GMC_HOLE_START) >>>>>>>>>> >> - init_pte_value = AMDGPU_PTE_DEFAULT_ATC; >>>>>>>>>> >> + r = amdgpu_vm_remove_ptes(adev, vm, >>>>>>>>>> >> + (mapping->start + 0x1ff) & (~0x1ffll), >>>>>>>>>> >> + (mapping->last + 1) & (~0x1ffll)); >>>>>>>>>> >> - r = amdgpu_vm_bo_update_mapping(adev, >>>>>>>>>> vm, false, NULL, >>>>>>>>>> >> - mapping->start, mapping->last, >>>>>>>>>> >> - init_pte_value, 0, NULL, &f); >>>>>>>>>> >> amdgpu_vm_free_mapping(adev, vm, mapping, f); >>>>>>>>>> >> if (r) { >>>>>>>>>> >> dma_fence_put(f); >>>>>>>>>> >> @@ -1980,7 +2021,6 @@ int >>>>>>>>>> amdgpu_vm_clear_freed(struct amdgpu_device >>>>>>>>>> >> *adev, >>>>>>>>>> >> } >>>>>>>>>> >> return 0; >>>>>>>>>> >> - >>>>>>>>>> >> } >>>>>>>>>> >> /** >>>>>>>>>> > >>>>>>>>>> > _______________________________________________ >>>>>>>>>> > amd-gfx mailing list >>>>>>>>>> > amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>>>>>>> <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> >>>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> amd-gfx mailing list >>>>>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx