Re: [PATCH v2] drm/amdgpu: Modify the contiguous flags behaviour

Philip Yang <yangp@xxxxxxx> · Wed, 17 Apr 2024 11:28:42 -0400



    On 2024-04-17 10:32, Paneer Selvam,
      Arunpravin wrote:

    
    Hi
      Christian,
      

      On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
      

      Hi Christian,
        

        On 4/17/2024 12:19 PM, Christian König wrote:
        

        Am 17.04.24 um 08:21 schrieb Arunpravin
          Paneer Selvam:
          

          Now we have two flags for contiguous
            VRAM buffer allocation.
            

            If the application request for
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
            

            it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in
            the
            

            buffer's placement function.
            

            This patch will change the default behaviour of the two
            flags.
            

            When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
            

            - This means contiguous is not mandatory.
            

            - we will try to allocate the contiguous buffer. Say if the
            

               allocation fails, we fallback to allocate the individual
            pages.
            

            When we setTTM_PL_FLAG_CONTIGUOUS
            

            - This means contiguous allocation is mandatory.
            

            - we are setting this in amdgpu_bo_pin_restricted() before
            bo validation
            

               and check this flag in the vram manager file.
            

            - if this is set, we should allocate the buffer pages
            contiguously.
            

               the allocation fails, we return -ENOSPC.
            

            v2:
            

               - keep the mem_flags and bo->flags check as
            is(Christian)
            

               - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
            

                 amdgpu_bo_pin_restricted function placement range
            iteration
            

                 loop(Christian)
            

               - rename find_pages with
            amdgpu_vram_mgr_calculate_pages_per_block
            

                 (Christian)
            

               - Keep the kernel BO allocation as is(Christain)
            

               - If BO pin vram allocation failed, we need to return
            -ENOSPC as
            

                 RDMA cannot work with scattered VRAM pages(Philip)
            

            Signed-off-by: Arunpravin Paneer Selvam
            <Arunpravin.PaneerSelvam@xxxxxxx>
            

            Suggested-by: Christian König
            <christian.koenig@xxxxxxx>
            

            ---
            

              drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  8 ++-
            

              drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57
            +++++++++++++++-----
            

              2 files changed, 50 insertions(+), 15 deletions(-)
            

            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            

            index 8bc79924d171..caaef7b1df49 100644
            

            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            

            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
            

            @@ -153,8 +153,10 @@ void
            amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32
            domain)
            

                      else
            

                          places[c].flags |= TTM_PL_FLAG_TOPDOWN;
            

              -        if (flags &
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
            

            +        if (abo->tbo.type == ttm_bo_type_kernel
            &&
            

            +            flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
            

                          places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
            

            +
            

                      c++;
            

                  }
            

              @@ -966,6 +968,10 @@ int amdgpu_bo_pin_restricted(struct
            amdgpu_bo *bo, u32 domain,
            

                      if (!bo->placements[i].lpfn ||
            

                          (lpfn && lpfn <
            bo->placements[i].lpfn))
            

                          bo->placements[i].lpfn = lpfn;
            

            +
            

            +        if (bo->flags &
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
            

            +            bo->placements[i].mem_type == TTM_PL_VRAM)
            

            +            bo->placements[i].flags |=
            TTM_PL_FLAG_CONTIGUOUS;
            

                  }
            

                    r = ttm_bo_validate(&bo->tbo,
            &bo->placement, &ctx);
            

          Nice work, up till here that looks exactly right as far as I
          can see.
          

          diff --git
            a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            

            index 8db880244324..4be8b091099a 100644
            

            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            

            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
            

            @@ -88,6 +88,29 @@ static inline u64
            amdgpu_vram_mgr_blocks_size(struct list_head *head)
            

                  return size;
            

              }
            

              +static inline unsigned long
            

            +amdgpu_vram_mgr_calculate_pages_per_block(struct
            ttm_buffer_object *tbo,
            

            +                      const struct ttm_place *place,
            

            +                      unsigned long bo_flags)
            

            +{
            

            +    unsigned long pages_per_block;
            

            +
            

            +    if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) {
            

            +        pages_per_block = ~0ul;
            

          If I understand it correctly this here enforces the allocation
          of a contiguous buffer in the way that it says we should have
          only one giant page for the whole BO.
          

        yes, for contiguous we don't have the 2MB limit, hence we are
        setting to maximum to avoid restricting the min_block_size to
        2MB limitation.
        

          +    } else {
            

            +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
            

            +        pages_per_block = HPAGE_PMD_NR;
            

            +#else
            

            +        /* default to 2MB */
            

            +        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
            

            +#endif
            

            +        pages_per_block = max_t(uint32_t, pages_per_block,
            

            +                    tbo->page_alignment);
            

            +    }
            

            +
            

            +    return pages_per_block;
            

            +}
            

            +
            

              /**
            

               * DOC: mem_info_vram_total
            

               *
            

            @@ -451,8 +474,10 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            

                  struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
            

                  struct amdgpu_device *adev = to_amdgpu_device(mgr);
            

                  u64 vis_usage = 0, max_bytes, min_block_size;
            

            +    struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
            

                  struct amdgpu_vram_mgr_resource *vres;
            

                  u64 size, remaining_size, lpfn, fpfn;
            

            +    unsigned long bo_flags = bo->flags;
            

                  struct drm_buddy *mm = &mgr->mm;
            

                  struct drm_buddy_block *block;
            

                  unsigned long pages_per_block;
            

            @@ -468,18 +493,9 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            

                  if (tbo->type != ttm_bo_type_kernel)
            

                      max_bytes -= AMDGPU_VM_RESERVED_VRAM;
            

              -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
            

            -        pages_per_block = ~0ul;
            

            -    } else {
            

            -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
            

            -        pages_per_block = HPAGE_PMD_NR;
            

            -#else
            

            -        /* default to 2MB */
            

            -        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
            

            -#endif
            

            -        pages_per_block = max_t(uint32_t, pages_per_block,
            

            -                    tbo->page_alignment);
            

            -    }
            

            +    pages_per_block =
            

            +        amdgpu_vram_mgr_calculate_pages_per_block(tbo,
            place,
            

            +                              bo_flags);
            

                    vres = kzalloc(sizeof(*vres), GFP_KERNEL);
            

                  if (!vres)
            

            @@ -498,7 +514,7 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            

                  if (place->flags & TTM_PL_FLAG_TOPDOWN)
            

                      vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
            

              -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
            

            +    if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
            

                      vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
            

          And this here just optimizes it in the way that it says try
          harder to make it look contiguous.
          

        Here I removed the TTM_PL_FLAG_CONTIGUOUS. AFAIU, in both cases
        (BO pinning and normal contiguous request)
        

        this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is always set.
        

          Question is if that also works with pages_per_block of 2MiB or
          does that only kick in when pages_per_block is maximal?
          

        AFAIU, if this flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is set, we
        are setting the pages_per_block as maximal, thus
        

        we dont limit the BO. when we set the pages_per_block as
        maximal, the min_block_size would be the tbo->page_alignment,
        

        and this tbo->page_alignment would be the same value as the
        required size. Required size can be less than 2MB or more than
        2MB.
        

        Below we have check size >= pages_per_block, when the
        pages_per_block is maximal we don't limit the min_block_size.
        

      a small correction, when we set the pages_per_block as maximal, we
      don't set the min_block_size, the buddy allocator will set the
      min_block_size
      

      as roundup_pow_of_two(size).
      

    If setting 2MB pages_per_block, without
      DRM_BUDDY_CONTIGUOUS_ALLOCATION flag, does buddy alloc from free
      2MB blocks first, or buddy split larger blocks into smaller
      blocks, as we want to keep larger block for contiguous allocation.
    Regards,
    Philip  
    
      
      Thanks,
      

      Arun.
      

                  if (fpfn || lpfn !=
            mgr->mm.size)
            

            @@ -529,8 +545,21 @@ static int amdgpu_vram_mgr_new(struct
            ttm_resource_manager *man,
            

                                     min_block_size,
            

                                     &vres->blocks,
            

                                     vres->flags);
            

            -        if (unlikely(r))
            

            +        if (unlikely(r)) {
            

            +            if (bo_flags &
            AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
            

            +                !(place->flags &
            TTM_PL_FLAG_CONTIGUOUS)) {
            

            +                /* Fallback to non contiguous allocation */
            

            +                vres->flags &=
            ~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
            

            +                bo_flags &=
            ~AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
            

          Well I would say that this is a rather hacky implementation
          and should be avoided if possible.
          

        sure, I will try to rewrite the code.
        

          +
            

            +                pages_per_block =
            

            + amdgpu_vram_mgr_calculate_pages_per_block(tbo,
            

            +                                          place,
            

            +                                          bo_flags);
            

          Rather move the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS handling out
          of amdgpu_vram_mgr_calculate_pages_per_block().
          

        sure.
        

        Thanks,
        

        Arun.
        

          Regards,
          

          Christian.
          

          +                continue;
            

            +            }
            

                          goto error_free_blocks;
            

            +        }
            

                        if (size > remaining_size)
            

                          remaining_size = 0;