On Thu, Mar 13, 2025 at 6:21 PM Rodrigo Siqueira <siqueira@xxxxxxxxxx> wrote: > > n 03/13, Alex Deucher wrote: > > To better evaluate user queues, add a module parameter > > to disable kernel queues. With this set kernel queues > > are disabled and only user queues are available. This > > frees up hardware resources for use in user queues which > > would otherwise be used by kernel queues and provides > > a way to validate user queues without the presence > > of kernel queues. > > Hi Alex, > > I'm trying to understand how GFX and MES deal with different queues, and > I used this patchset to guide me through that. In this sense, could you > help me with the following points? > > FWIU, the GFX has what are called pipes, which in turn have hardware > queues associated with them. For example, a GFX can have 2 pipes, and > each pipe could have 2 hardware queues; or it could have 1 pipe and 8 > queue. Is this correct? Right. For gfx, compute, and SDMA you have pipes (called instances on SDMA) and queues. A pipe can only execute one queue at a time. The pipe will switch between all of the mapped queues. You have storage in memory (called an MDQ -- Memory Queue Descriptor) which defines the state of the queue (GPU virtual addresses of the queue itself, save areas, doorbell, etc.). The queues that the pipe switches between are defined by HQDs (Hardware Queue Descriptors). These are basically register based memory for the queues that the pipe can switch between. The driver sets up an MQD for each queue that it creates. The MQDs are then handed to the MES firmware for mapping. The MES firmware can map a queue as a legacy queue (i.e. a kernel queue) or a user queue. The difference is that a legacy queue is statically mapped to a HQD and is never preempted. User queues are dynamically mapped to the HQDs by the MES firmware. If there are more MQDs than HQDs, the MES firmware will preempt other user queues to make sure each queue gets a time slice. > > (for this next part, suppose 1 pipe 2 hardware queues) > By default, one of the hardware queues is reserved for the Kernel Queue, > and the user space could use the other. GFX has the MES block "connected" > to all pipe queues, and MES is responsible for scheduling different ring > buffers (in memory) in the pipe's hardware queue (effectively making the > ring active). However, since the kernel queue is always present, MES > only performs scheduling in one of the hardware queues. This scheduling > occurs with the MES mapping and unmapping available Rings in memory to > the hardware queue. > > Does the above description sound correct to you? How about the below > diagram? Does it look correct to you? More or less. The MES handles all of the queues (kernel or user). The only real difference is that kernel queues are statically mapped to an HQD while user queues are dynamically scheduled in the available HQDs based on level of over-subscription. E.g., if you have hardware with 1 pipe and 2 HQDs you could have a kernel queue on 1 HQD and the MES would schedule all of the user queues on the remaining 1 HQD. If you don't enable any kernel queues, then you have 2 HQDs that the MES can use for scheduling user queues. > > (I hope the diagram looks fine in your email client; if not, I can > attach a picture of it.) > > +-------------------------------------------------------------------------------------------------------------------------------------------+ > | GFX | > | | > | +-----------------------------+ | > | +---------------------------------------------+ (Hw Queue 0) | Kernel Queue (No eviction) +------- No MES Scheduling | > | | (Hardware Queue 0) | ------------------->| | | | > |PIPE 0 | ------------------------------------- | +-----------------------------+ X | > | | (Hardware Queue 1) | +----------+---------+ | > | | ------------------------------------- |--+ | | | > | | | | +----------------------------+ | | | > | +---------------------------------------------+ | (Hw Queue 1) | | | MES Schedules | | > | +----------------> | User Queue +-----+ | | > | | | | | | > | +----------------------------+ | | | > | +--------------------+ | > | | | > | +-------------------------------------+ | > | |Un/Map Ring | > | | | > +-------------------------------------------------------------------------------------------------------------------------------------------+ > | > +---------------------+--------------------------------------------+ > | MEMORY v | > | | > | | > | +----------+ | > | | | +---------+ +--------+ | > | | Ring 0| | Ring 1 | ... | Ring N | | > | | | | | | | | > | +----------+ +---------+ +--------+ | > | | > | | > +------------------------------------------------------------------+ > > Is the idea in this series to experiment with making the kernel queue > not fully occupy one of the hardware queue? By making the kernel queue > able to be scheduled, this would provide one extra queue to be used for > other things. Is this correct? Right. This series paves the way for getting rid of kernel queues all together. Having no kernel queues leaves all of the resources available to user queues. > > I'm unsure if I fully understand this series's idea; please correct me > if I'm wrong. > > Also, please elaborate more on the type of tasks that the kernel queue > handles. Tbh, I did not fully understand the idea behind it. In the future of user queues, kernel queues would not be created or used at all. Today, on most existing hardware, kernel queues are all that is available. Today, when an application submits work to the kernel driver, the kernel driver submits all of the application command buffers to kernel queues. E.g., in most cases there is a single kernel GFX queue and all applications which want to use the GFX engine funnel into that queue. The CS IOCTL basically takes the command buffers from the applications and schedules them on the kernel queue. With user queues, each application will create its own user queues and will submit work directly to its user queues. No need for an IOCTL for each submission, no need to share a single kernel queue, etc. Alex > > Thanks > > > > > v2: use num_gfx_rings and num_compute_rings per > > Felix suggestion > > v3: include num_gfx_rings fix in amdgpu_gfx.c > > v4: additional fixes > > v5: MEC EOP interrupt handling fix (Sunil) > > > > Alex Deucher (11): > > drm/amdgpu: add parameter to disable kernel queues > > drm/amdgpu: add ring flag for no user submissions > > drm/amdgpu/gfx: add generic handling for disable_kq > > drm/amdgpu/mes: centralize gfx_hqd mask management > > drm/amdgpu/mes: update hqd masks when disable_kq is set > > drm/amdgpu/mes: make more vmids available when disable_kq=1 > > drm/amdgpu/gfx11: add support for disable_kq > > drm/amdgpu/gfx12: add support for disable_kq > > drm/amdgpu/sdma: add flag for tracking disable_kq > > drm/amdgpu/sdma6: add support for disable_kq > > drm/amdgpu/sdma7: add support for disable_kq > > > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 ++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 8 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 30 ++-- > > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 26 ++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 + > > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 191 ++++++++++++++++------- > > drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 183 +++++++++++++++------- > > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +- > > drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 2 +- > > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 16 +- > > drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 15 +- > > drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 4 + > > drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 4 + > > 17 files changed, 345 insertions(+), 155 deletions(-) > > > > -- > > 2.48.1 > > > > -- > Rodrigo Siqueira