Re: [PATCH libdrm] amdgpu: add a faster BO list API

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Wed, 16 Jan 2019 15:43:15 +0100



    Am 16.01.19 um 15:39 schrieb Marek
      Olšák:

    
            On Wed, Jan 16, 2019, 9:34 AM Koenig,
              Christian <Christian.Koenig@xxxxxxx
              wrote:

            
                Am
                  16.01.19 um 15:31 schrieb Marek Olšák:

                
                        On Wed, Jan 16, 2019, 7:55 AM
                          Christian König <ckoenig.leichtzumerken@xxxxxxxxx
                          wrote:

                        
                          Well if you ask me we should have the
                          following interface for 

                          negotiating memory management with the kernel:

                          
                          1. We have per process BOs which can't be
                          shared between processes.

                          
                          Those are always valid and don't need to be
                          mentioned in any BO list 

                          whatsoever.

                          
                          If we knew that a per process BO is currently
                          not in use we can 

                          optionally tell that to the kernel to make
                          memory management more efficient.

                          
                          In other words instead of a list of stuff
                          which is used we send down to 

                          the kernel a list of stuff which is not used
                          any more and that only when 

                          we know that it is necessary, e.g. when a game
                          or application overcommits.

                        
                    Radeonsi doesn't use this because
                      this approach caused performance degradation and
                      also drops BO priorities.
                  
                
                The performance degradation where mostly shortcomings
                with the LRU which by now have been fixed.

                
                BO priorities are a different topic, but could be added
                to per VM BOs as well.

              
        What's the minimum drm version that contains the
          fixes?
      
    
    I've pushed the last optimization this morning. No idea when it
    really became useful, but the numbers from the closed source clients
    now look much better.

    
    We should probably test and bump the drm version when we are sure
    that this now works as expected.

    
    Christian.

    
        Marek
        

                Christian.

                
                    Marek
                    

                          2. We have shared BOs which are used by more
                          than one process.

                          
                          Those are rare and should be added to the per
                          CS list of BOs in use.

                          
                          The whole BO list interface Marek tries to
                          optimize here should be 

                          deprecated and not used any more.

                          
                          Regards,

                          Christian.

                          
                          Am 16.01.19 um 13:46 schrieb Bas
                          Nieuwenhuizen:

                          > So random questions:

                          >

                          > 1) In this discussion it was mentioned
                          that some Vulkan drivers still

                          > use the bo_list interface. I think that
                          implies radv as I think we're

                          > still using bo_list. Is there any other
                          API we should be using? (Also,

                          > with VK_EXT_descriptor_indexing I suspect
                          we'll be moving more towards

                          > a global bo list instead of a cmd buffer
                          one, as we cannot know all

                          > the BOs referenced anymore, but not sure
                          what end state here will be).

                          >

                          > 2) The other alternative mentioned was
                          adding the buffers directly

                          > into the submit ioctl. Is this the
                          desired end state (though as above

                          > I'm not sure how that works for vulkan)?
                          If yes, what is the timeline

                          > for this that we need something in the
                          interim?

                          >

                          > 3) Did we measure any performance
                          benefit?

                          >

                          > In general I'd like to to ack the raw bo
                          list creation function as

                          > this interface seems easier to use. The
                          two arrays thing has always

                          > been kind of a pain when we want to use
                          e.g. builtin sort functions to

                          > make sure we have no duplicate BOs, but
                          have some comments below.

                          >

                          > On Mon, Jan 7, 2019 at 8:31 PM Marek
                          Olšák <maraeo@xxxxxxxxx>
                          wrote:

                          >> From: Marek Olšák <marek.olsak@xxxxxxx>

                          >>

                          >> ---

                          >>   amdgpu/amdgpu-symbol-check |  3 ++

                          >>   amdgpu/amdgpu.h            | 56
                          +++++++++++++++++++++++++++++++++++++-

                          >>   amdgpu/amdgpu_bo.c         | 36
                          ++++++++++++++++++++++++

                          >>   amdgpu/amdgpu_cs.c         | 25
                          +++++++++++++++++

                          >>   4 files changed, 119 insertions(+),
                          1 deletion(-)

                          >>

                          >> diff --git
                          a/amdgpu/amdgpu-symbol-check
                          b/amdgpu/amdgpu-symbol-check

                          >> index 6f5e0f95..96a44b40 100755

                          >> --- a/amdgpu/amdgpu-symbol-check

                          >> +++ b/amdgpu/amdgpu-symbol-check

                          >> @@ -12,20 +12,22 @@ _edata

                          >>   _end

                          >>   _fini

                          >>   _init

                          >>   amdgpu_bo_alloc

                          >>   amdgpu_bo_cpu_map

                          >>   amdgpu_bo_cpu_unmap

                          >>   amdgpu_bo_export

                          >>   amdgpu_bo_free

                          >>   amdgpu_bo_import

                          >>   amdgpu_bo_inc_ref

                          >> +amdgpu_bo_list_create_raw

                          >> +amdgpu_bo_list_destroy_raw

                          >>   amdgpu_bo_list_create

                          >>   amdgpu_bo_list_destroy

                          >>   amdgpu_bo_list_update

                          >>   amdgpu_bo_query_info

                          >>   amdgpu_bo_set_metadata

                          >>   amdgpu_bo_va_op

                          >>   amdgpu_bo_va_op_raw

                          >>   amdgpu_bo_wait_for_idle

                          >>   amdgpu_create_bo_from_user_mem

                          >>   amdgpu_cs_chunk_fence_info_to_data

                          >> @@ -40,20 +42,21 @@
                          amdgpu_cs_destroy_semaphore

                          >>   amdgpu_cs_destroy_syncobj

                          >>   amdgpu_cs_export_syncobj

                          >>   amdgpu_cs_fence_to_handle

                          >>   amdgpu_cs_import_syncobj

                          >>   amdgpu_cs_query_fence_status

                          >>   amdgpu_cs_query_reset_state

                          >>   amdgpu_query_sw_info

                          >>   amdgpu_cs_signal_semaphore

                          >>   amdgpu_cs_submit

                          >>   amdgpu_cs_submit_raw

                          >> +amdgpu_cs_submit_raw2

                          >>   amdgpu_cs_syncobj_export_sync_file

                          >>   amdgpu_cs_syncobj_import_sync_file

                          >>   amdgpu_cs_syncobj_reset

                          >>   amdgpu_cs_syncobj_signal

                          >>   amdgpu_cs_syncobj_wait

                          >>   amdgpu_cs_wait_fences

                          >>   amdgpu_cs_wait_semaphore

                          >>   amdgpu_device_deinitialize

                          >>   amdgpu_device_initialize

                          >>   amdgpu_find_bo_by_cpu_mapping

                          >> diff --git a/amdgpu/amdgpu.h
                          b/amdgpu/amdgpu.h

                          >> index dc51659a..5b800033 100644

                          >> --- a/amdgpu/amdgpu.h

                          >> +++ b/amdgpu/amdgpu.h

                          >> @@ -35,20 +35,21 @@

                          >>   #define _AMDGPU_H_

                          >>

                          >>   #include <stdint.h>

                          >>   #include <stdbool.h>

                          >>

                          >>   #ifdef __cplusplus

                          >>   extern "C" {

                          >>   #endif

                          >>

                          >>   struct drm_amdgpu_info_hw_ip;

                          >> +struct drm_amdgpu_bo_list_entry;

                          >>

                          >> 
 /*--------------------------------------------------------------------------*/

                          >>   /* ---------------------------
                          Defines ------------------------------------
                          */

                          >> 
 /*--------------------------------------------------------------------------*/

                          >>

                          >>   /**

                          >>    * Define max. number of Command
                          Buffers (IB) which could be sent to the single

                          >>    * hardware IP to accommodate CE/DE
                          requirements

                          >>    *

                          >>    * \sa amdgpu_cs_ib_info

                          >> @@ -767,34 +768,65 @@ int
                          amdgpu_bo_cpu_unmap(amdgpu_bo_handle
                          buf_handle);

                          >>    *                            and
                          no GPU access is scheduled.

                          >>    *                          1 GPU
                          access is in fly or scheduled

                          >>    *

                          >>    * \return   0 - on success

                          >>    *          <0 - Negative POSIX
                          Error code

                          >>    */

                          >>   int
                          amdgpu_bo_wait_for_idle(amdgpu_bo_handle
                          buf_handle,

                          >>                              uint64_t
                          timeout_ns,

                          >>                              bool
                          *buffer_busy);

                          >>

                          >> +/**

                          >> + * Creates a BO list handle for
                          command submission.

                          >> + *

                          >> + * \param   dev                     
                            - \c [in] Device handle.

                          >> + *                               
                          See #amdgpu_device_initialize()

                          >> + * \param   number_of_buffers  - \c
                          [in] Number of BOs in the list

                          >> + * \param   buffers            - \c
                          [in] List of BO handles

                          >> + * \param   result             - \c
                          [out] Created BO list handle

                          >> + *

                          >> + * \return   0 on success\n

                          >> + *          <0 - Negative POSIX
                          Error code

                          >> + *

                          >> + * \sa amdgpu_bo_list_destroy_raw()

                          >> +*/

                          >> +int
                          amdgpu_bo_list_create_raw(amdgpu_device_handle
                          dev,

                          >> +                           
                           uint32_t number_of_buffers,

                          >> +                             struct
                          drm_amdgpu_bo_list_entry *buffers,

                          >> +                           
                           uint32_t *result);

                          > So AFAIU  drm_amdgpu_bo_list_entry takes
                          a raw bo handle while we

                          > never get a raw bo handle from
                          libdrm_amdgpu. How are we supposed to

                          > fill it in?

                          >

                          > What do we win by having the raw handle
                          for the bo_list? If we would

                          > not return the raw handle we would not
                          need the submit_raw2.

                          >

                          >> +

                          >> +/**

                          >> + * Destroys a BO list handle.

                          >> + *

                          >> + * \param   bo_list    - \c [in] BO
                          list handle.

                          >> + *

                          >> + * \return   0 on success\n

                          >> + *          <0 - Negative POSIX
                          Error code

                          >> + *

                          >> + * \sa amdgpu_bo_list_create_raw(),
                          amdgpu_cs_submit_raw2()

                          >> +*/

                          >> +int
                          amdgpu_bo_list_destroy_raw(amdgpu_device_handle
                          dev, uint32_t bo_list);

                          >> +

                          >>   /**

                          >>    * Creates a BO list handle for
                          command submission.

                          >>    *

                          >>    * \param   dev                   
                              - \c [in] Device handle.

                          >>    *                               
                          See #amdgpu_device_initialize()

                          >>    * \param   number_of_resources   
                              - \c [in] Number of BOs in the list

                          >>    * \param   resources          - \c
                          [in] List of BO handles

                          >>    * \param   resource_prios     - \c
                          [in] Optional priority for each handle

                          >>    * \param   result             - \c
                          [out] Created BO list handle

                          >>    *

                          >>    * \return   0 on success\n

                          >>    *          <0 - Negative POSIX
                          Error code

                          >>    *

                          >> - * \sa amdgpu_bo_list_destroy()

                          >> + * \sa amdgpu_bo_list_destroy(),
                          amdgpu_cs_submit_raw2()

                          >>   */

                          >>   int
                          amdgpu_bo_list_create(amdgpu_device_handle
                          dev,

                          >>                            uint32_t
                          number_of_resources,

                          >>                           
                          amdgpu_bo_handle *resources,

                          >>                            uint8_t
                          *resource_prios,

                          >>                           
                          amdgpu_bo_list_handle *result);

                          >>

                          >>   /**

                          >>    * Destroys a BO list handle.

                          >>    *

                          >> @@ -1580,20 +1612,42 @@ struct
                          drm_amdgpu_cs_chunk;

                          >>   struct drm_amdgpu_cs_chunk_dep;

                          >>   struct drm_amdgpu_cs_chunk_data;

                          >>

                          >>   int
                          amdgpu_cs_submit_raw(amdgpu_device_handle dev,

                          >>                         
                           amdgpu_context_handle context,

                          >>                         
                           amdgpu_bo_list_handle bo_list_handle,

                          >>                           int
                          num_chunks,

                          >>                           struct
                          drm_amdgpu_cs_chunk *chunks,

                          >>                           uint64_t
                          *seq_no);

                          >>

                          >> +/**

                          >> + * Submit raw command submission to
                          the kernel with a raw BO list handle.

                          >> + *

                          >> + * \param   dev               - \c
                          [in] device handle

                          >> + * \param   context    - \c [in]
                          context handle for context id

                          >> + * \param   bo_list_handle - \c [in]
                          raw bo list handle (0 for none)

                          >> + * \param   num_chunks - \c [in]
                          number of CS chunks to submit

                          >> + * \param   chunks     - \c [in]
                          array of CS chunks

                          >> + * \param   seq_no     - \c [out]
                          output sequence number for submission.

                          >> + *

                          >> + * \return   0 on success\n

                          >> + *          <0 - Negative POSIX
                          Error code

                          >> + *

                          >> + * \sa amdgpu_bo_list_create_raw(),
                          amdgpu_bo_list_destroy_raw()

                          >> + */

                          >> +int
                          amdgpu_cs_submit_raw2(amdgpu_device_handle
                          dev,

                          >> +                       
                           amdgpu_context_handle context,

                          >> +                         uint32_t
                          bo_list_handle,

                          >> +                         int
                          num_chunks,

                          >> +                         struct
                          drm_amdgpu_cs_chunk *chunks,

                          >> +                         uint64_t
                          *seq_no);

                          >> +

                          >>   void
                          amdgpu_cs_chunk_fence_to_dep(struct
                          amdgpu_cs_fence *fence,

                          >>                                   
                          struct drm_amdgpu_cs_chunk_dep *dep);

                          >>   void
                          amdgpu_cs_chunk_fence_info_to_data(struct
                          amdgpu_cs_fence_info *fence_info,

                          >>                                     
                              struct drm_amdgpu_cs_chunk_data *data);

                          >>

                          >>   /**

                          >>    * Reserve VMID

                          >>    * \param   context - \c [in]  GPU
                          Context

                          >>    * \param   flags - \c [in]  TBD

                          >>    *

                          >> diff --git a/amdgpu/amdgpu_bo.c
                          b/amdgpu/amdgpu_bo.c

                          >> index c0f42e81..21bc73aa 100644

                          >> --- a/amdgpu/amdgpu_bo.c

                          >> +++ b/amdgpu/amdgpu_bo.c

                          >> @@ -611,20 +611,56 @@ drm_public int
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

                          >>         
                          pthread_mutex_lock(&dev->bo_table_mutex);

                          >>          r =
                          handle_table_insert(&dev->bo_handles,
                          (*buf_handle)->handle,

                          >>                                 
                          *buf_handle);

                          >>         
                          pthread_mutex_unlock(&dev->bo_table_mutex);

                          >>          if (r)

                          >>                 
                          amdgpu_bo_free(*buf_handle);

                          >>   out:

                          >>          return r;

                          >>   }

                          >>

                          >> +drm_public int
                          amdgpu_bo_list_create_raw(amdgpu_device_handle
                          dev,

                          >> +                                   
                              uint32_t number_of_buffers,

                          >> +                                   
                              struct drm_amdgpu_bo_list_entry *buffers,

                          >> +                                   
                              uint32_t *result)

                          >> +{

                          >> +       union drm_amdgpu_bo_list
                          args;

                          >> +       int r;

                          >> +

                          >> +       memset(&args, 0,
                          sizeof(args));

                          >> +       args.in.operation =
                          AMDGPU_BO_LIST_OP_CREATE;

                          >> +       args.in.bo_number =
                          number_of_buffers;

                          >> +       args.in.bo_info_size =
                          sizeof(struct drm_amdgpu_bo_list_entry);

                          >> +       args.in.bo_info_ptr =
                          (uint64_t)(uintptr_t)buffers;

                          >> +

                          >> +       r =
                          drmCommandWriteRead(dev->fd,
                          DRM_AMDGPU_BO_LIST,

                          >> +                             
                           &args, sizeof(args));

                          >> +       if (r)

                          >> +               return r;

                          >> +

                          >> +       *result =
                          args.out.list_handle;

                          >> +       return 0;

                          >> +}

                          >> +

                          >> +drm_public int
                          amdgpu_bo_list_destroy_raw(amdgpu_device_handle
                          dev,

                          >> +                                   
                               uint32_t bo_list)

                          >> +{

                          >> +       union drm_amdgpu_bo_list
                          args;

                          >> +

                          >> +       memset(&args, 0,
                          sizeof(args));

                          >> +       args.in.operation =
                          AMDGPU_BO_LIST_OP_DESTROY;

                          >> +       args.in.list_handle =
                          bo_list;

                          >> +

                          >> +       return
                          drmCommandWriteRead(dev->fd,
                          DRM_AMDGPU_BO_LIST,

                          >> +                                 
                          &args, sizeof(args));

                          >> +}

                          >> +

                          >>   drm_public int
                          amdgpu_bo_list_create(amdgpu_device_handle
                          dev,

                          >>                                     
                           uint32_t number_of_resources,

                          >>                                     
                           amdgpu_bo_handle *resources,

                          >>                                     
                           uint8_t *resource_prios,

                          >>                                     
                           amdgpu_bo_list_handle *result)

                          >>   {

                          >>          struct
                          drm_amdgpu_bo_list_entry *list;

                          >>          union drm_amdgpu_bo_list
                          args;

                          >>          unsigned i;

                          >>          int r;

                          >> diff --git a/amdgpu/amdgpu_cs.c
                          b/amdgpu/amdgpu_cs.c

                          >> index 3b8231aa..5bedf748 100644

                          >> --- a/amdgpu/amdgpu_cs.c

                          >> +++ b/amdgpu/amdgpu_cs.c

                          >> @@ -724,20 +724,45 @@ drm_public int
                          amdgpu_cs_submit_raw(amdgpu_device_handle dev,

                          >>          r =
                          drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS,

                          >>                                 
                          &cs, sizeof(cs));

                          >>          if (r)

                          >>                  return r;

                          >>

                          >>          if (seq_no)

                          >>                  *seq_no =
                          cs.out.handle;

                          >>          return 0;

                          >>   }

                          >>

                          >> +drm_public int
                          amdgpu_cs_submit_raw2(amdgpu_device_handle
                          dev,

                          >> +                                   
                          amdgpu_context_handle context,

                          >> +                                   
                          uint32_t bo_list_handle,

                          >> +                                   
                          int num_chunks,

                          >> +                                   
                          struct drm_amdgpu_cs_chunk *chunks,

                          >> +                                   
                          uint64_t *seq_no)

                          >> +{

                          >> +       union drm_amdgpu_cs cs = {0};

                          >> +       uint64_t *chunk_array;

                          >> +       int i, r;

                          >> +

                          >> +       chunk_array =
                          alloca(sizeof(uint64_t) * num_chunks);

                          >> +       for (i = 0; i <
                          num_chunks; i++)

                          >> +               chunk_array[i] =
                          (uint64_t)(uintptr_t)&chunks[i];

                          >> +       cs.in.chunks =
                          (uint64_t)(uintptr_t)chunk_array;

                          >> +       cs.in.ctx_id =
                          context->id;

                          >> +       cs.in.bo_list_handle =
                          bo_list_handle;

                          >> +       cs.in.num_chunks =
                          num_chunks;

                          >> +       r =
                          drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS,

                          >> +                             
                           &cs, sizeof(cs));

                          >> +       if (!r && seq_no)

                          >> +               *seq_no =
                          cs.out.handle;

                          >> +       return r;

                          >> +}

                          >> +

                          >>   drm_public void
                          amdgpu_cs_chunk_fence_info_to_data(struct
                          amdgpu_cs_fence_info *fence_info,

                          >>                                     
                              struct drm_amdgpu_cs_chunk_data *data)

                          >>   {

                          >>          data->fence_data.handle =
                          fence_info->handle->handle;

                          >>          data->fence_data.offset =
                          fence_info->offset * sizeof(uint64_t);

                          >>   }

                          >>

                          >>   drm_public void
                          amdgpu_cs_chunk_fence_to_dep(struct
                          amdgpu_cs_fence *fence,

                          >>                                     
                              struct drm_amdgpu_cs_chunk_dep *dep)

                          >>   {

                          >> --

                          >> 2.17.1

                          >>

                          >>
                          _______________________________________________

                          >> amd-gfx mailing list

                          >> 
                            amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                          >> 
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

                          >
                          _______________________________________________

                          > amd-gfx mailing list

                          > 
                            amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                          > 
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

                          
      _______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

    
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx