Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

"Huang, JinHuiEric" <JinHuiEric.Huang@xxxxxxx> · Thu, 31 Oct 2019 14:33:31 +0000

The hardware is vega10 and test is KFDMemoryTest.BigBufferStressTest. 
More detail is on Jira SWDEV-201443.

Regards,

Eric

On 2019-10-31 10:08 a.m., StDenis, Tom wrote:
> I could try it on my carrizo/polaris setup.  Is there a test procedure I
> could folllow to trigger the changed code paths?
>
>
> Tom
>
> On 2019-10-31 6:41 a.m., Koenig, Christian wrote:
>> Just tested this and amdgpu_vm_update_ptes() indeed works as expected.
>>
>> When you free at least a 2MB the lowest level of page tables is freed
>> up again.
>>
>> BTW: What hardware have you tested this on? On gfx8 and older it is
>> expected that page tables are never freed.
>>
>> Regards,
>> Christian.
>>
>> Am 30.10.19 um 19:11 schrieb Christian König:
>>> Then I don't see how this patch actually changes anything.
>>>
>>> Could only be a bug in amdgpu_vm_update_ptes(). Going to investigate
>>> this, but I won't have time to look into the ticket in detail.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 30.10.19 um 19:00 schrieb Huang, JinHuiEric:
>>>> Actually I do prevent to remove in-use pts by this:
>>>>
>>>> +               r = amdgpu_vm_remove_ptes(adev, vm,
>>>> +                               (mapping->start + 0x1ff) & (~0x1ffll),
>>>> +                               (mapping->last + 1) & (~0x1ffll));
>>>>
>>>> Which is only removing aligned page table for 2M. And I have tested
>>>> it at least on KFD tests without anything broken.
>>>>
>>>> By the way, I am not familiar with memory staff. This patch is the
>>>> best I can do for now. Could you take a look at the Jira ticket
>>>> SWDEV-201443 ? and find the better solution. Thanks!
>>>>
>>>> Regards,
>>>>
>>>> Eric
>>>>
>>>> On 2019-10-30 1:57 p.m., Christian König wrote:
>>>>> One thing I've forgotten:
>>>>>
>>>>> What you could maybe do to improve the situation is to join
>>>>> adjacent ranges in amdgpu_vm_clear_freed(), but I'm not sure how
>>>>> the chances are that the ranges are freed all together.
>>>>>
>>>>> The only other alternative I can see would be to check the mappings
>>>>> of a range in amdgpu_update_ptes() and see if you could walk the
>>>>> tree up if the valid flag is not set and there are no mappings left
>>>>> for a page table.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 30.10.19 um 18:42 schrieb Koenig, Christian:
>>>>>>> The vaild flag doesn't take effect in this function.
>>>>>> That's irrelevant.
>>>>>>
>>>>>> See what amdgpu_vm_update_ptes() does is to first determine the
>>>>>> fragment size:
>>>>>>> amdgpu_vm_fragment(params, frag_start, end, flags, &frag, &frag_end);
>>>>>> Then we walk down the tree:
>>>>>>>          amdgpu_vm_pt_start(adev, params->vm, start, &cursor);
>>>>>>>          while (cursor.pfn < end) {
>>>>>> And make sure that the page tables covering the address range are
>>>>>> actually allocated:
>>>>>>>                  r = amdgpu_vm_alloc_pts(params->adev, params->vm,
>>>>>>> &cursor);
>>>>>> Then we update the tables with the flags and addresses and free up
>>>>>> subsequent tables in the case of huge pages or freed up areas:
>>>>>>>                          /* Free all child entries */
>>>>>>>                          while (cursor.pfn < frag_start) {
>>>>>>> amdgpu_vm_free_pts(adev, params->vm, &cursor);
>>>>>>> amdgpu_vm_pt_next(adev, &cursor);
>>>>>>>                          }
>>>>>> This is the maximum you can free, cause all other page tables are
>>>>>> not completely covered by the range and so potentially still in use.
>>>>>>
>>>>>> And I have the strong suspicion that this is what your patch is
>>>>>> actually doing wrong. In other words you are also freeing page
>>>>>> tables which are only partially covered by the range and so
>>>>>> potentially still in use.
>>>>>>
>>>>>> Since we don't have any tracking how many entries in a page table
>>>>>> are currently valid and how many are invalid we actually can't
>>>>>> implement what you are trying to do here. So the patch is
>>>>>> definitely somehow broken.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> Am 30.10.19 um 17:55 schrieb Huang, JinHuiEric:
>>>>>>> The vaild flag doesn't take effect in this function.
>>>>>>> amdgpu_vm_alloc_pts() is always executed that only depended on
>>>>>>> "cursor.pfn < end". The valid flag has only been checked on here
>>>>>>> for asic below GMC v9:
>>>>>>>
>>>>>>> if (adev->asic_type < CHIP_VEGA10 &&
>>>>>>>              (flags & AMDGPU_PTE_VALID))...
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>> On 2019-10-30 12:30 p.m., Koenig, Christian wrote:
>>>>>>>>
>>>>>>>> Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric"
>>>>>>>> <JinHuiEric.Huang@xxxxxxx>:
>>>>>>>>
>>>>>>>>      I tested it that it saves a lot of vram on KFD big buffer
>>>>>>>>      stress test. I think there are two reasons:
>>>>>>>>
>>>>>>>>      1. Calling amdgpu_vm_update_ptes() during unmapping will
>>>>>>>>      allocate unnecessary pts, because there is no flag to
>>>>>>>>      determine if the VA is mapping or unmapping in function
>>>>>>>>      amdgpu_vm_update_ptes(). It saves the most of memory.
>>>>>>>>
>>>>>>>> That's not correct. The valid flag is used for this.
>>>>>>>>
>>>>>>>>      2. Intentionally removing those unmapping pts is logical
>>>>>>>>      expectation, although it is not removing so much pts.
>>>>>>>>
>>>>>>>> Well I actually don't see a change to what update_ptes is doing
>>>>>>>> and have the strong suspicion that the patch is simply broken.
>>>>>>>>
>>>>>>>> You either free page tables which are potentially still in use
>>>>>>>> or update_pte doesn't free page tables when the valid but is not
>>>>>>>> set.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>      Regards,
>>>>>>>>
>>>>>>>>      Eric
>>>>>>>>
>>>>>>>>      On 2019-10-30 11:57 a.m., Koenig, Christian wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          Am 30.10.2019 16:47 schrieb "Kuehling, Felix"
>>>>>>>>          <Felix.Kuehling@xxxxxxx> <mailto:Felix.Kuehling@xxxxxxx>:
>>>>>>>>
>>>>>>>>              On 2019-10-30 9:52 a.m., Christian König wrote:
>>>>>>>>              > Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>>>>>>>>              >> The issue is PT BOs are not freed when unmapping VA,
>>>>>>>>              >> which causes vram usage accumulated is huge in some
>>>>>>>>              >> memory stress test, such as kfd big buffer stress
>>>>>>>>              test.
>>>>>>>>              >> Function amdgpu_vm_bo_update_mapping() is called
>>>>>>>>              by both
>>>>>>>>              >> amdgpu_vm_bo_update() and
>>>>>>>>              amdgpu_vm_clear_freed(). The
>>>>>>>>              >> solution is replacing
>>>>>>>>              amdgpu_vm_bo_update_mapping() in
>>>>>>>>              >> amdgpu_vm_clear_freed() with removing PT BOs function
>>>>>>>>              >> to save vram usage.
>>>>>>>>              >
>>>>>>>>              > NAK, that is intentional behavior.
>>>>>>>>              >
>>>>>>>>              > Otherwise we can run into out of memory situations
>>>>>>>>              when page tables
>>>>>>>>              > need to be allocated again under stress.
>>>>>>>>
>>>>>>>>              That's a bit arbitrary and inconsistent. We are
>>>>>>>>              freeing page tables in
>>>>>>>>              other situations, when a mapping uses huge pages in
>>>>>>>>              amdgpu_vm_update_ptes. Why not when a mapping is
>>>>>>>>              destroyed completely?
>>>>>>>>
>>>>>>>>              I'm actually a bit surprised that the huge-page
>>>>>>>>              handling in
>>>>>>>>              amdgpu_vm_update_ptes isn't kicking in to free up
>>>>>>>>              lower-level page
>>>>>>>>              tables when a BO is unmapped.
>>>>>>>>
>>>>>>>>
>>>>>>>>          Well it does free the lower level, and that is already
>>>>>>>>          causing problems (that's why I added the reserved space).
>>>>>>>>
>>>>>>>>          What we don't do is freeing the higher levels.
>>>>>>>>
>>>>>>>>          E.g. when you free a 2MB BO we free the lowest level, if
>>>>>>>>          we free a 1GB BO we free the two lowest levels etc...
>>>>>>>>
>>>>>>>>          The problem with freeing the higher levels is that you
>>>>>>>>          don't know who is also using this. E.g. we would need to
>>>>>>>>          check all entries when we unmap one.
>>>>>>>>
>>>>>>>>          It's simply not worth it for a maximum saving of 2MB per VM.
>>>>>>>>
>>>>>>>>          Writing this I'm actually wondering how you ended up in
>>>>>>>>          this issue? There shouldn't be much savings from this.
>>>>>>>>
>>>>>>>>          Regards,
>>>>>>>>          Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>>              Regards,
>>>>>>>>                 Felix
>>>>>>>>
>>>>>>>>
>>>>>>>>              >
>>>>>>>>              > Regards,
>>>>>>>>              > Christian.
>>>>>>>>              >
>>>>>>>>              >>
>>>>>>>>              >> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>>>>>>>>              >> Signed-off-by: Eric Huang
>>>>>>>>              <JinhuiEric.Huang@xxxxxxx>
>>>>>>>>              <mailto:JinhuiEric.Huang@xxxxxxx>
>>>>>>>>              >> ---
>>>>>>>>              >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56
>>>>>>>>              >> +++++++++++++++++++++++++++++-----
>>>>>>>>              >>   1 file changed, 48 insertions(+), 8 deletions(-)
>>>>>>>>              >>
>>>>>>>>              >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>              >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>              >> index 0f4c3b2..8a480c7 100644
>>>>>>>>              >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>              >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>              >> @@ -1930,6 +1930,51 @@ static void
>>>>>>>>              amdgpu_vm_prt_fini(struct
>>>>>>>>              >> amdgpu_device *adev, struct amdgpu_vm *vm)
>>>>>>>>              >>   }
>>>>>>>>              >>     /**
>>>>>>>>              >> + * amdgpu_vm_remove_ptes - free PT BOs
>>>>>>>>              >> + *
>>>>>>>>              >> + * @adev: amdgpu device structure
>>>>>>>>              >> + * @vm: amdgpu vm structure
>>>>>>>>              >> + * @start: start of mapped range
>>>>>>>>              >> + * @end: end of mapped entry
>>>>>>>>              >> + *
>>>>>>>>              >> + * Free the page table level.
>>>>>>>>              >> + */
>>>>>>>>              >> +static int amdgpu_vm_remove_ptes(struct
>>>>>>>>              amdgpu_device *adev,
>>>>>>>>              >> + struct amdgpu_vm *vm, uint64_t start, uint64_t end)
>>>>>>>>              >> +{
>>>>>>>>              >> +    struct amdgpu_vm_pt_cursor cursor;
>>>>>>>>              >> +    unsigned shift, num_entries;
>>>>>>>>              >> +
>>>>>>>>              >> + amdgpu_vm_pt_start(adev, vm, start, &cursor);
>>>>>>>>              >> +    while (cursor.level < AMDGPU_VM_PTB) {
>>>>>>>>              >> +        if (!amdgpu_vm_pt_descendant(adev, &cursor))
>>>>>>>>              >> + return -ENOENT;
>>>>>>>>              >> +    }
>>>>>>>>              >> +
>>>>>>>>              >> +    while (cursor.pfn < end) {
>>>>>>>>              >> + amdgpu_vm_free_table(cursor.entry);
>>>>>>>>              >> + num_entries = amdgpu_vm_num_entries(adev,
>>>>>>>>              cursor.level - 1);
>>>>>>>>              >> +
>>>>>>>>              >> +        if (cursor.entry !=
>>>>>>>>              &cursor.parent->entries[num_entries - 1]) {
>>>>>>>>              >> + /* Next ptb entry */
>>>>>>>>              >> + shift = amdgpu_vm_level_shift(adev,
>>>>>>>>              cursor.level - 1);
>>>>>>>>              >> + cursor.pfn += 1ULL << shift;
>>>>>>>>              >> + cursor.pfn &= ~((1ULL << shift) - 1);
>>>>>>>>              >> + cursor.entry++;
>>>>>>>>              >> +        } else {
>>>>>>>>              >> + /* Next ptb entry in next pd0 entry */
>>>>>>>>              >> + amdgpu_vm_pt_ancestor(&cursor);
>>>>>>>>              >> + shift = amdgpu_vm_level_shift(adev,
>>>>>>>>              cursor.level - 1);
>>>>>>>>              >> + cursor.pfn += 1ULL << shift;
>>>>>>>>              >> + cursor.pfn &= ~((1ULL << shift) - 1);
>>>>>>>>              >> + amdgpu_vm_pt_descendant(adev, &cursor);
>>>>>>>>              >> +        }
>>>>>>>>              >> +    }
>>>>>>>>              >> +
>>>>>>>>              >> +    return 0;
>>>>>>>>              >> +}
>>>>>>>>              >> +
>>>>>>>>              >> +/**
>>>>>>>>              >>    * amdgpu_vm_clear_freed - clear freed BOs in
>>>>>>>>              the PT
>>>>>>>>              >>    *
>>>>>>>>              >>    * @adev: amdgpu_device pointer
>>>>>>>>              >> @@ -1949,7 +1994,6 @@ int
>>>>>>>>              amdgpu_vm_clear_freed(struct amdgpu_device
>>>>>>>>              >> *adev,
>>>>>>>>              >>                 struct dma_fence **fence)
>>>>>>>>              >>   {
>>>>>>>>              >>       struct amdgpu_bo_va_mapping *mapping;
>>>>>>>>              >> -    uint64_t init_pte_value = 0;
>>>>>>>>              >>       struct dma_fence *f = NULL;
>>>>>>>>              >>       int r;
>>>>>>>>              >>   @@ -1958,13 +2002,10 @@ int
>>>>>>>>              amdgpu_vm_clear_freed(struct
>>>>>>>>              >> amdgpu_device *adev,
>>>>>>>>              >> struct amdgpu_bo_va_mapping, list);
>>>>>>>>              >> list_del(&mapping->list);
>>>>>>>>              >>   -        if (vm->pte_support_ats &&
>>>>>>>>              >> - mapping->start < AMDGPU_GMC_HOLE_START)
>>>>>>>>              >> - init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>>>>>>>>              >> +        r = amdgpu_vm_remove_ptes(adev, vm,
>>>>>>>>              >> + (mapping->start + 0x1ff) & (~0x1ffll),
>>>>>>>>              >> + (mapping->last + 1) & (~0x1ffll));
>>>>>>>>              >>   -        r = amdgpu_vm_bo_update_mapping(adev,
>>>>>>>>              vm, false, NULL,
>>>>>>>>              >> - mapping->start, mapping->last,
>>>>>>>>              >> - init_pte_value, 0, NULL, &f);
>>>>>>>>              >> amdgpu_vm_free_mapping(adev, vm, mapping, f);
>>>>>>>>              >>           if (r) {
>>>>>>>>              >> dma_fence_put(f);
>>>>>>>>              >> @@ -1980,7 +2021,6 @@ int
>>>>>>>>              amdgpu_vm_clear_freed(struct amdgpu_device
>>>>>>>>              >> *adev,
>>>>>>>>              >>       }
>>>>>>>>              >> return 0;
>>>>>>>>              >> -
>>>>>>>>              >>   }
>>>>>>>>              >>     /**
>>>>>>>>              >
>>>>>>>>              > _______________________________________________
>>>>>>>>              > amd-gfx mailing list
>>>>>>>>              > amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>>>>>>>>              <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
>>>>>>>>              > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx