Re: [PATCH v5 06/10] drm/amdgpu: map usermode queue into MES

Shashank Sharma <shashank.sharma@xxxxxxx> · Thu, 6 Jul 2023 19:32:46 +0200

On 06/07/2023 19:26, Alex Deucher wrote:
On Thu, Jul 6, 2023 at 1:15 PM Shashank Sharma <shashank.sharma@xxxxxxx> wrote:

On 06/07/2023 18:52, Alex Deucher wrote:
On Thu, Jul 6, 2023 at 8:36 AM Shashank Sharma <shashank.sharma@xxxxxxx> wrote:
This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
      - Map/Unmap should be IP specific.
V2:
      Addressed review comments from Christian:
      - Fix the wptr_mc_addr calculation (moved into another patch)
      Addressed review comments from Alex:
      - Do not add fptrs for map/unmap

V3: Integration with doorbell manager
V4: Rebase
V5: Use gfx_v11_0 for function names (Alex)

Cc: Alex Deucher <alexander.deucher@xxxxxxx>
Cc: Christian Koenig <christian.koenig@xxxxxxx>
Signed-off-by: Shashank Sharma <shashank.sharma@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 70 ++++++++++++++++++++++++++
   1 file changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 7d3b19e08bbb..b4a0f26a0e8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -6491,6 +6491,65 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
          .funcs = &gfx_v11_0_ip_funcs,
   };

+static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+                                 struct amdgpu_usermode_queue *queue)
+{
+       struct amdgpu_device *adev = uq_mgr->adev;
+       struct mes_remove_queue_input queue_input;
+       int r;
+
+       memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+       queue_input.doorbell_offset = queue->doorbell_index;
+       queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
+
+       amdgpu_mes_lock(&adev->mes);
+       r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+       amdgpu_mes_unlock(&adev->mes);
+       if (r)
+               DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
+static int gfx_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+                              struct amdgpu_usermode_queue *queue,
+                              struct amdgpu_mqd_prop *userq_props)
+{
+       struct amdgpu_device *adev = uq_mgr->adev;
+       struct mes_add_queue_input queue_input;
+       int r;
+
+       memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+       queue_input.process_va_start = 0;
+       queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
+       queue_input.process_quantum = 100000; /* 10ms */
+       queue_input.gang_quantum = 10000; /* 1ms */
+       queue_input.paging = false;
+
+       queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
+       queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
+       queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+       queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
Was there an option in the MQD to specify a priority?
I checked the gfx_v11_MQD structure and this MQD does have an option to
specify the priority of a queue (offset 134), but as we are re-using the
mqd_init function from gfx_v11_ip_funcs which sets this offset to 0 by
default, its not being used.

We can add a parameter for queue priority and overwrite the init values.

The priority which we are setting here in this function, is for queue
mapping using MES, and its the gang priority.
Thinking about this more, the priority would come from the context.
E.g., ctx->init_priority and ctx->override_priority (see
amdgpu_ctx_init()).  We should take that into account when creating
the queue.

In the current design, the userqueue is completely independent of the 
GFX ctx (we discussed this in V2 I think, and that's when we introduced 
the user_mgr). I agree that we should consider the queue priority, but 
we might have to get this parameter specifically from the mqd_user_in.



   What about
secure settings?  If not, we should validate those flags properly and
return an error if they are not currently supported.
+
+       queue_input.process_id = queue->vm->pasid;
+       queue_input.queue_type = queue->queue_type;
+       queue_input.mqd_addr = queue->mqd.gpu_addr;
+       queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+       queue_input.queue_size = userq_props->queue_size >> 2;
Do we validate the size anywhere?
We are validating the whole structure/user_MQD size, but not
specifically queue size. But based on your suggestion on libDRM UAPI, we
are planing to add an USERQ_INFO_IOCTL in a separate patch series, which
will then introduce the IP based dynamic size checking, and also the
checks related to alignment and queue size.
We just want to protect from userspace doing something crazy like
making a 10M queue or something like that.  We should add an interface
to query the sizes per IP, but we need to validate the inputs as well.

Agree, I will add this queue size validation check in this series 
itself, will cross check some other inputs as well.

- Shashank


Alex

- Shashank

+       queue_input.doorbell_offset = userq_props->doorbell_index;
+       queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+       amdgpu_mes_lock(&adev->mes);
+       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+       amdgpu_mes_unlock(&adev->mes);
+       if (r) {
+               DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+               return r;
+       }
+
+       DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", userq_props->doorbell_index);
+       return 0;
+}
+
   static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
                                                struct amdgpu_usermode_queue *queue)
   {
@@ -6601,8 +6660,18 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
                  goto free_mqd;
          }

+       /* Map userqueue into FW using MES */
+       r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
+       if (r) {
+               DRM_ERROR("Failed to init MQD\n");
+               goto free_ctx;
+       }
+
          return 0;

+free_ctx:
+       gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
+
   free_mqd:
          amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, &queue->mqd.cpu_ptr);
          return r;
@@ -6613,6 +6682,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
   {
          struct amdgpu_userq_obj *mqd = &queue->mqd;

+       gfx_v11_0_userq_unmap(uq_mgr, queue);
          gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
          amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
   }
--
2.40.1