Re: [PATCH v2] drm/amdgpu: add workarounds for Raven VCN TMZ issue

Christian König <christian.koenig@xxxxxxx> · Thu, 10 Mar 2022 15:25:59 +0100

Ok, thanks.

Lang is that case your patch should work fine.

Just add another patch with a check for the encode case to reject any CS 
with TMZ buffers in it.

Thanks,
Christian.

Am 10.03.22 um 15:17 schrieb Leo Liu:
No need for encode. Encrypting uses TEE/TA to convert clear bitstream 
to encrypted bitstream, and has nothing to do with VCN encode and tmz.

Regards,

Leo


On 2022-03-10 04:53, Christian König wrote:
Leo you didn't answered the question if we need TMZ for encode as well.

Regards,
Christian.

Am 10.03.22 um 09:45 schrieb Lang Yu:
Ping.

On 03/08/ , Leo Liu wrote:
On 2022-03-08 11:18, Leo Liu wrote:
On 2022-03-08 04:16, Christian König wrote:
Am 08.03.22 um 09:06 schrieb Lang Yu:
On 03/08/ , Christian König wrote:
Am 08.03.22 um 08:33 schrieb Lang Yu:
On 03/08/ , Christian König wrote:
Am 08.03.22 um 04:39 schrieb Lang Yu:
It is a hardware issue that VCN can't handle a GTT
backing stored TMZ buffer on Raven.

Move such a TMZ buffer to VRAM domain before command
submission.

v2:
     - Use patch_cs_in_place callback.

Suggested-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx>
---
     drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 68
+++++++++++++++++++++++++++
     1 file changed, 68 insertions(+)

diff --git
a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 7bbb9ba6b80b..810932abd3af 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -24,6 +24,7 @@
     #include <linux/firmware.h>
     #include "amdgpu.h"
+#include "amdgpu_cs.h"
     #include "amdgpu_vcn.h"
     #include "amdgpu_pm.h"
     #include "soc15.h"
@@ -1905,6 +1906,72 @@ static const struct
amd_ip_funcs vcn_v1_0_ip_funcs = {
         .set_powergating_state = 
vcn_v1_0_set_powergating_state,
     };
+/**
+ * It is a hardware issue that Raven VCN can't
handle a GTT TMZ buffer.
+ * Move such a GTT TMZ buffer to VRAM domain
before command submission.
+ */
+static int vcn_v1_0_validate_bo(struct amdgpu_cs_parser 
*parser,
+                struct amdgpu_job *job,
+                uint64_t addr)
+{
+    struct ttm_operation_ctx ctx = { false, false };
+    struct amdgpu_fpriv *fpriv = parser->filp->driver_priv;
+    struct amdgpu_vm *vm = &fpriv->vm;
+    struct amdgpu_bo_va_mapping *mapping;
+    struct amdgpu_bo *bo;
+    int r;
+
+    addr &= AMDGPU_GMC_HOLE_MASK;
+    if (addr & 0x7) {
+        DRM_ERROR("VCN messages must be 8 byte aligned!\n");
+        return -EINVAL;
+    }
+
+    mapping = amdgpu_vm_bo_lookup_mapping(vm,
addr/AMDGPU_GPU_PAGE_SIZE);
+    if (!mapping || !mapping->bo_va || 
!mapping->bo_va->base.bo)
+        return -EINVAL;
+
+    bo = mapping->bo_va->base.bo;
+    if (!(bo->flags & AMDGPU_GEM_CREATE_ENCRYPTED))
+        return 0;
+
+    amdgpu_bo_placement_from_domain(bo, 
AMDGPU_GEM_DOMAIN_VRAM);
+    r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+    if (r) {
+        DRM_ERROR("Failed validating the VCN
message BO (%d)!\n", r);
+        return r;
+    }
Well, exactly that won't work.

The message structure isn't TMZ protected because
otherwise the driver won't
be able to stitch it together.

What is TMZ protected are the surfaces the message
structure is pointing to.
So what you would need to do is to completely parse
the structure and then
move on the relevant buffers into VRAM.

Leo or James, can you help with that?
   From my observations, when decoding secure contents, register
GPCOM_VCPU_DATA0 and GPCOM_VCPU_DATA1 are set to a TMZ
buffer address.
And this way works when allocating TMZ buffers in GTT domain.
As far as I remember that's only the case for the decoding,
encoding works
by putting the addresses into the message buffer.

But could be that decoding is sufficient, Leo and James need
to comment on
this.
It seems that only decode needs TMZ buffers. Only observe
si_vid_create_tmz_buffer()
was called in rvcn_dec_message_decode() in
src/gallium/drivers/radeon/radeon_vcn_dec.c.
Mhm, good point. Let's wait for Leo and James to wake up, when we
don't need encode support than that would makes things much easier.
For secure playback, the buffer required in TMZ are dpb, dt and 
ctx, for
the rest esp. those for CPU access don't need that E.g. msg 
buffer, and
bitstream buffer.

 From radeon_vcn_dec.c, you can see the buffer for dpb and ctx, 
and dt
buffer frontend/va/surface is set to PIPE_BIND_PROTECTED.


Regards,

Leo

For VCN1, due to performance reason, the msg and fb buffer was 
allocated
into VRAM instead of GTT(for other HW), but those are not TMZ in 
order to
have CPU access.


Regards,

Leo




Regards,
Christian.

Regards,
Lang

Regards,
Christian.

Regards,
Lang

Regards,
Christian.

+
+    return r;
+}
+
+static int
vcn_v1_0_ring_patch_cs_in_place(struct
amdgpu_cs_parser *p,
+                       struct amdgpu_job *job,
+                       struct amdgpu_ib *ib)
+{
+    uint32_t msg_lo = 0, msg_hi = 0;
+    int i, r;
+
+    for (i = 0; i < ib->length_dw; i += 2) {
+        uint32_t reg = amdgpu_ib_get_value(ib, i);
+        uint32_t val = amdgpu_ib_get_value(ib, i + 1);
+
+        if (reg == PACKET0(p->adev->vcn.internal.data0, 0)) {
+            msg_lo = val;
+        } else if (reg ==
PACKET0(p->adev->vcn.internal.data1, 0)) {
+            msg_hi = val;
+        } else if (reg == 
PACKET0(p->adev->vcn.internal.cmd, 0)) {
+            r = vcn_v1_0_validate_bo(p, job,
+                         ((u64)msg_hi) << 32 | msg_lo);
+            if (r)
+                return r;
+        }
+    }
+
+    return 0;
+}
+
+
     static const struct amdgpu_ring_funcs
vcn_v1_0_dec_ring_vm_funcs = {
         .type = AMDGPU_RING_TYPE_VCN_DEC,
         .align_mask = 0xf,
@@ -1914,6 +1981,7 @@ static const struct
amdgpu_ring_funcs vcn_v1_0_dec_ring_vm_funcs = {
         .get_rptr = vcn_v1_0_dec_ring_get_rptr,
         .get_wptr = vcn_v1_0_dec_ring_get_wptr,
         .set_wptr = vcn_v1_0_dec_ring_set_wptr,
+    .patch_cs_in_place = vcn_v1_0_ring_patch_cs_in_place,
         .emit_frame_size =
             6 + 6 + /* hdp invalidate / flush */
             SOC15_FLUSH_GPU_TLB_NUM_WREG * 6 +