Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

"Olsak, Marek" <Marek.Olsak@xxxxxxx> · Fri, 8 Dec 2023 14:49:31 +0000

[AMD Official Use Only - General]

Christian, firmware has nothing to do with it and doesn't control it. That was a wrong group of people to ping.
 It's only implemented in the SPI and tested by the SPI team and PAL team.

Marek

From: Koenig, Christian <Christian.Koenig@xxxxxxx>

Sent: December 8, 2023 09:38

To: Friedrich Vock <friedrich.vock@xxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>

Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Olsak, Marek <Marek.Olsak@xxxxxxx>

Subject: Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

Am 08.12.23 um 12:43 schrieb Friedrich Vock:

> On 08.12.23 10:51, Christian König wrote:

>> Well longer story short Alex and I have been digging up the

>> documentation for this and as far as we can tell this isn't correct.

> Huh. I initially talked to Marek about this, adding him in Cc.

Yeah, from the userspace side all you need to do is to set the bit as 

far as I can tell.

>>

>> You need to do quite a bit more before you can turn on this feature.

>> What userspace side do you refer to?

> I was referring to the Mesa merge request I made

> (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462).

> If/When you have more details about what else needs to be done, feel

> free to let me know.

For example from the hardware specification explicitly states that the 

kernel driver should make sure that only one app/queue is using this at 

the same time. That might work for now since we should only have a 

single compute priority queue, but we are not 100% sure yet.

Apart from that the hardware documentation only says that it's a nice to 

have feature and when we pinged firmware engineers to get more 

information they didn't know the feature immediately either.

That is usually a strong indicator that stuff was implemented in the 

hardware, but not fully completed and tested by the firmware team and 

validation team.

Alex and I need to confirm that this feature actually works the way it 

should and that it's validated/stable/read for production use.

Regards,

Christian.

> I'm happy to expand this to add the rest of what's needed as well.

>

> Thanks,

> Friedrich

>

>>

>> Regards,

>> Christian.

>>

>> Am 08.12.23 um 09:19 schrieb Friedrich Vock:

>>> Friendly ping on this one.

>>> Userspace side got merged, so would be great to land this patch too :)

>>>

>>> On 02.12.23 01:17, Friedrich Vock wrote:

>>>> This improves latency if the GPU is already busy with other work.

>>>> This is useful for VR compositors that submit highly latency-sensitive

>>>> compositing work on high-priority compute queues while the GPU is busy

>>>> rendering the next frame.

>>>>

>>>> Userspace merge request:

>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462

>>>>

>>>> Signed-off-by: Friedrich Vock <friedrich.vock@xxxxxx>

>>>> ---

>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  1 +

>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 10 ++++++----

>>>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   |  3 ++-

>>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  3 ++-

>>>>   4 files changed, 11 insertions(+), 6 deletions(-)

>>>>

>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

>>>> index 9505dc8f9d69..4b923a156c4e 100644

>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

>>>> @@ -790,6 +790,7 @@ struct amdgpu_mqd_prop {

>>>>       uint64_t eop_gpu_addr;

>>>>       uint32_t hqd_pipe_priority;

>>>>       uint32_t hqd_queue_priority;

>>>> +    bool allow_tunneling;

>>>>       bool hqd_active;

>>>>   };

>>>>

>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

>>>> index 231d49132a56..4d98e8879be8 100644

>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

>>>> @@ -620,6 +620,10 @@ static void amdgpu_ring_to_mqd_prop(struct

>>>> amdgpu_ring *ring,

>>>>                       struct amdgpu_mqd_prop *prop)

>>>>   {

>>>>       struct amdgpu_device *adev = ring->adev;

>>>> +    bool is_high_prio_compute = ring->funcs->type ==

>>>> AMDGPU_RING_TYPE_COMPUTE &&

>>>> + amdgpu_gfx_is_high_priority_compute_queue(adev, ring);

>>>> +    bool is_high_prio_gfx = ring->funcs->type ==

>>>> AMDGPU_RING_TYPE_GFX &&

>>>> + amdgpu_gfx_is_high_priority_graphics_queue(adev, ring);

>>>>

>>>>       memset(prop, 0, sizeof(*prop));

>>>>

>>>> @@ -637,10 +641,8 @@ static void amdgpu_ring_to_mqd_prop(struct

>>>> amdgpu_ring *ring,

>>>>        */

>>>>       prop->hqd_active = ring->funcs->type == AMDGPU_RING_TYPE_KIQ;

>>>>

>>>> -    if ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE &&

>>>> -         amdgpu_gfx_is_high_priority_compute_queue(adev, ring)) ||

>>>> -        (ring->funcs->type == AMDGPU_RING_TYPE_GFX &&

>>>> -         amdgpu_gfx_is_high_priority_graphics_queue(adev, ring))) {

>>>> +    prop->allow_tunneling = is_high_prio_compute;

>>>> +    if (is_high_prio_compute || is_high_prio_gfx) {

>>>>           prop->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_HIGH;

>>>>           prop->hqd_queue_priority = 

>>>> AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM;

>>>>       }

>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

>>>> index c8a3bf01743f..73f6d7e72c73 100644

>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

>>>> @@ -6593,7 +6593,8 @@ static int gfx_v10_0_compute_mqd_init(struct

>>>> amdgpu_device *adev, void *m,

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, ENDIAN_SWAP, 1);

>>>>   #endif

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);

>>>> -    tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH, 0);

>>>> +    tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH,

>>>> +                prop->allow_tunneling);

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1);

>>>>       mqd->cp_hqd_pq_control = tmp;

>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

>>>> index c659ef0f47ce..bdcf96df69e6 100644

>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

>>>> @@ -3847,7 +3847,8 @@ static int gfx_v11_0_compute_mqd_init(struct

>>>> amdgpu_device *adev, void *m,

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, RPTR_BLOCK_SIZE,

>>>>                   (order_base_2(AMDGPU_GPU_PAGE_SIZE / 4) - 1));

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);

>>>> -    tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH, 0);

>>>> +    tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH,

>>>> +                prop->allow_tunneling);

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);

>>>>       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1);

>>>>       mqd->cp_hqd_pq_control = tmp;

>>>> -- 

>>>> 2.43.0

>>>>

>>