Re: [PATCH v2] drm/amdgpu: add workarounds for Raven VCN TMZ issue

Leo Liu <leo.liu@xxxxxxx> · Tue, 8 Mar 2022 11:18:43 -0500

On 2022-03-08 04:16, Christian König wrote:
Am 08.03.22 um 09:06 schrieb Lang Yu:
On 03/08/ , Christian König wrote:
Am 08.03.22 um 08:33 schrieb Lang Yu:
On 03/08/ , Christian König wrote:
Am 08.03.22 um 04:39 schrieb Lang Yu:
It is a hardware issue that VCN can't handle a GTT
backing stored TMZ buffer on Raven.

Move such a TMZ buffer to VRAM domain before command
submission.

v2:
    - Use patch_cs_in_place callback.

Suggested-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx>
---
    drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 68 
+++++++++++++++++++++++++++
    1 file changed, 68 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 7bbb9ba6b80b..810932abd3af 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -24,6 +24,7 @@
    #include <linux/firmware.h>
    #include "amdgpu.h"
+#include "amdgpu_cs.h"
    #include "amdgpu_vcn.h"
    #include "amdgpu_pm.h"
    #include "soc15.h"
@@ -1905,6 +1906,72 @@ static const struct amd_ip_funcs 
vcn_v1_0_ip_funcs = {
        .set_powergating_state = vcn_v1_0_set_powergating_state,
    };
+/**
+ * It is a hardware issue that Raven VCN can't handle a GTT TMZ 
buffer.
+ * Move such a GTT TMZ buffer to VRAM domain before command 
submission.
+ */
+static int vcn_v1_0_validate_bo(struct amdgpu_cs_parser *parser,
+                struct amdgpu_job *job,
+                uint64_t addr)
+{
+    struct ttm_operation_ctx ctx = { false, false };
+    struct amdgpu_fpriv *fpriv = parser->filp->driver_priv;
+    struct amdgpu_vm *vm = &fpriv->vm;
+    struct amdgpu_bo_va_mapping *mapping;
+    struct amdgpu_bo *bo;
+    int r;
+
+    addr &= AMDGPU_GMC_HOLE_MASK;
+    if (addr & 0x7) {
+        DRM_ERROR("VCN messages must be 8 byte aligned!\n");
+        return -EINVAL;
+    }
+
+    mapping = amdgpu_vm_bo_lookup_mapping(vm, 
addr/AMDGPU_GPU_PAGE_SIZE);
+    if (!mapping || !mapping->bo_va || !mapping->bo_va->base.bo)
+        return -EINVAL;
+
+    bo = mapping->bo_va->base.bo;
+    if (!(bo->flags & AMDGPU_GEM_CREATE_ENCRYPTED))
+        return 0;
+
+    amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
+    r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+    if (r) {
+        DRM_ERROR("Failed validating the VCN message BO 
(%d)!\n", r);
+        return r;
+    }
Well, exactly that won't work.

The message structure isn't TMZ protected because otherwise the 
driver won't
be able to stitch it together.

What is TMZ protected are the surfaces the message structure is 
pointing to.
So what you would need to do is to completely parse the structure 
and then
move on the relevant buffers into VRAM.

Leo or James, can you help with that?
  From my observations, when decoding secure contents, register
GPCOM_VCPU_DATA0 and GPCOM_VCPU_DATA1 are set to a TMZ buffer address.
And this way works when allocating TMZ buffers in GTT domain.
As far as I remember that's only the case for the decoding, encoding 
works
by putting the addresses into the message buffer.

But could be that decoding is sufficient, Leo and James need to 
comment on
this.
It seems that only decode needs TMZ buffers. Only observe 
si_vid_create_tmz_buffer()
was called in rvcn_dec_message_decode() in 
src/gallium/drivers/radeon/radeon_vcn_dec.c.

Mhm, good point. Let's wait for Leo and James to wake up, when we 
don't need encode support than that would makes things much easier.

For secure playback, the buffer required in TMZ are dpb, dt and ctx, for 
the rest esp. those for CPU access don't need that E.g. msg buffer, and 
bitstream buffer.

From radeon_vcn_dec.c, you can see the buffer for dpb and ctx, and dt 
buffer frontend/va/surface is set to PIPE_BIND_PROTECTED.


Regards,

Leo




Regards,
Christian.


Regards,
Lang

Regards,
Christian.

Regards,
Lang

Regards,
Christian.

+
+    return r;
+}
+
+static int vcn_v1_0_ring_patch_cs_in_place(struct 
amdgpu_cs_parser *p,
+                       struct amdgpu_job *job,
+                       struct amdgpu_ib *ib)
+{
+    uint32_t msg_lo = 0, msg_hi = 0;
+    int i, r;
+
+    for (i = 0; i < ib->length_dw; i += 2) {
+        uint32_t reg = amdgpu_ib_get_value(ib, i);
+        uint32_t val = amdgpu_ib_get_value(ib, i + 1);
+
+        if (reg == PACKET0(p->adev->vcn.internal.data0, 0)) {
+            msg_lo = val;
+        } else if (reg == PACKET0(p->adev->vcn.internal.data1, 
0)) {
+            msg_hi = val;
+        } else if (reg == PACKET0(p->adev->vcn.internal.cmd, 0)) {
+            r = vcn_v1_0_validate_bo(p, job,
+                         ((u64)msg_hi) << 32 | msg_lo);
+            if (r)
+                return r;
+        }
+    }
+
+    return 0;
+}
+
+
    static const struct amdgpu_ring_funcs 
vcn_v1_0_dec_ring_vm_funcs = {
        .type = AMDGPU_RING_TYPE_VCN_DEC,
        .align_mask = 0xf,
@@ -1914,6 +1981,7 @@ static const struct amdgpu_ring_funcs 
vcn_v1_0_dec_ring_vm_funcs = {
        .get_rptr = vcn_v1_0_dec_ring_get_rptr,
        .get_wptr = vcn_v1_0_dec_ring_get_wptr,
        .set_wptr = vcn_v1_0_dec_ring_set_wptr,
+    .patch_cs_in_place = vcn_v1_0_ring_patch_cs_in_place,
        .emit_frame_size =
            6 + 6 + /* hdp invalidate / flush */
            SOC15_FLUSH_GPU_TLB_NUM_WREG * 6 +