Re: [PATCH v3 2/4] media: rockchip: Introduce the rkvdec2 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 27.06.24 um 11:41 schrieb Jonas Karlman:
Hi Alex,

On 2024-06-26 11:12, Alex Bee wrote:
Hi Detlev,

Am 25.06.24 um 18:56 schrieb Detlev Casanova:
Hi Alex,

On Sunday, June 23, 2024 5:33:28 A.M. EDT you wrote:
Hi Detlev,

Am 20.06.24 um 16:19 schrieb Detlev Casanova:
This driver supports the second generation of the Rockchip Video
decoder, also known as vdpu34x.
It is currently only used on the RK3588(s) SoC.

There are 2 decoders on the RK3588 SoC that can work in pair to decode
8K video at 30 FPS but currently, only using one core at a time is
supported.

Scheduling requests between the two cores will be implemented later.

The core supports H264, HEVC, VP9 and AVS2 decoding but this driver
currently only supports H264.

The driver is based on rkvdec and they may share some code in the
future.
The decision to make a different driver is mainly because rkvdec2 has
more features and can work with multiple cores.

The registers are mapped in a struct in RAM using bitfields. It is IO
copied to the HW when all values are configured.
The decision to use such a struct instead of writing buffers one by one

is based on the following reasons:
    - Rockchip cores are known to misbehave when registers are not written
in address order, - Those cores also need the software to write all registers, even if they are written their default values or are not related to the task
      (this core will not start decoding some H264 frames if some VP9
      registers are not written to 0)
- In the future, to support multiple cores, the scheduler could be optimized by storing the precomputed registers values and copy them
      to the HW as soos as a core becomes available.

This makes the code more readable and may bring performance improvements
in future features.

Signed-off-by: Detlev Casanova <detlev.casanova@xxxxxxxxxxxxx>
---

    drivers/staging/media/Kconfig                |    1 +
    drivers/staging/media/Makefile               |    1 +
    drivers/staging/media/rkvdec2/Kconfig        |   15 +
    drivers/staging/media/rkvdec2/Makefile       |    3 +
    drivers/staging/media/rkvdec2/TODO           |    9 +
    drivers/staging/media/rkvdec2/rkvdec2-h264.c |  739 +++++++++++
    drivers/staging/media/rkvdec2/rkvdec2-regs.h |  345 +++++
    drivers/staging/media/rkvdec2/rkvdec2.c      | 1253 ++++++++++++++++++
    drivers/staging/media/rkvdec2/rkvdec2.h      |  130 ++
    9 files changed, 2496 insertions(+)
    create mode 100644 drivers/staging/media/rkvdec2/Kconfig
    create mode 100644 drivers/staging/media/rkvdec2/Makefile
    create mode 100644 drivers/staging/media/rkvdec2/TODO
    create mode 100644 drivers/staging/media/rkvdec2/rkvdec2-h264.c
    create mode 100644 drivers/staging/media/rkvdec2/rkvdec2-regs.h
    create mode 100644 drivers/staging/media/rkvdec2/rkvdec2.c
    create mode 100644 drivers/staging/media/rkvdec2/rkvdec2.h
...

+static inline void rkvdec2_memcpy_toio(void __iomem *dst, void *src,
size_t len) +{
+#ifdef CONFIG_ARM64
+	__iowrite32_copy(dst, src, len);
+#elif defined(CONFIG_ARM)
I guess that can get an "#else" since memcpy_toio exists for all archs.

+	memcpy_toio(dst, src, len);
+#endif
+}
+
...

+	/* Set timeout threshold */
+	if (pixels < RKVDEC2_1080P_PIXELS)
+		regs->common.timeout_threshold = RKVDEC2_TIMEOUT_1080p;
+	else if (pixels < RKVDEC2_4K_PIXELS)
+		regs->common.timeout_threshold = RKVDEC2_TIMEOUT_4K;
+	else if (pixels < RKVDEC2_8K_PIXELS)
+		regs->common.timeout_threshold = RKVDEC2_TIMEOUT_8K;
+
Did you test if it works with anything > 8K? If so, you propably want to
make the check above

+	else
+		regs->common.timeout_threshold = RKVDEC2_TIMEOUT_8K;

Otherwise the timeout may not be set/contain invalid values from any former
stream.
That's right, but it would be set to 0 because of the memset.
RKVDEC2_TIMEOUT_8K might not be enough for bigger frame sizes, so I'll set it
to the maximum value (0xffffffff) when frames are bigger than 8K and also adapt
the watchdog time: RKVDEC2_TIMEOUT_8K is around 100 ms, but 0xffffffff is arnoud
5.3 seconds (reg032/axi_clock_freq)

I'll do more tests with this as well.

...

+
+static const struct rkvdec2_coded_fmt_desc rkvdec2_coded_fmts[] = {
+	{
+		.fourcc = V4L2_PIX_FMT_H264_SLICE,
+		.frmsize = {
+			.min_width = 16,
+			.max_width =  65520,
+			.step_width = 16,
+			.min_height = 16,
+			.max_height =  65520,
+			.step_height = 16,
+		},
+		.ctrls = &rkvdec2_h264_ctrls,
+		.ops = &rkvdec2_h264_fmt_ops,
+		.num_decoded_fmts =
ARRAY_SIZE(rkvdec2_h264_decoded_fmts),
+		.decoded_fmts = rkvdec2_h264_decoded_fmts,
+		.subsystem_flags =
VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF,
+	},
+};
+
Note, that this is also given to userspace (VIDIOC_ENUM_FRAMESIZES) and
this is already incorrect in the old rkvdec driver (and hantro): From
userspace perspective we do not have a restriction in
step_width/step_width, as we are aligning any given width/height to HW
requirements in the driver - what we should give to userspace is
fsize->type = V4L2_FRMSIZE_TYPE_CONTINUOUS; fsize->stepwise.min_height =
1; fsize->stepwise.min_width = 1; fsize->stepwise.max_height = 65520;
fsize->stepwise.max_width = 65520;
Is fsize->stepwise.min_height = 1; and fsize->stepwise.min_width = 1 correct ?
Or do you mean fsize->stepwise.step_height = 1; and fsize->stepwise.setp_width
= 1 ?

It would give this instead:

.frmsize = {
	.min_width = 16,
	.max_width =  65520,
	.step_width = 1,
	.min_height = 16,
	.max_height =  65520,
	.step_height = 1,
},

and .vidioc_enum_framesizes sets fsize->type = V4L2_FRMSIZE_TYPE_CONTINUOUS;
You can't adapt this here, because this .frmsize is also given to the
v4l2_apply_frmsize_constraints helper, which does the actual alignment to
HW requirements and requires the HW step_with and step_height.
IIRC, we also align framesizes which are below minimum HW requirement, at
least in rkvdec1 driver and it looks a lot like this is done here the same:
so this should be .min_height = 1 and .min_width = 1. (I remember because
there are VP9 conformance tests with very small framesizes). And yes, it
looks like you've had to set .step_width and .step_height to 1 for
V4L2_FRMSIZE_TYPE_CONTINUOUS, not sure why that is required.

So, imho, the final rkvdec2_enum_framesizes should look like

+static int rkvdec2_enum_framesizes(struct file *file, void *priv,
+                   struct v4l2_frmsizeenum *fsize)
....
+    fmt = rkvdec2_find_coded_fmt_desc(fsize->pixel_format);
+    if (!fmt)
+        return -EINVAL;
+
+    fsize->type = V4L2_FRMSIZE_TYPE_CONTINUOUS;
+    fsize->stepwise.min_height = 1;
+    fsize->stepwise.max_height = fmt->frmsize.max_height;
+    fsize->stepwise.min_width = 1;
+    fsize->stepwise.max_width = fmt->frmsize.max_width;
+    fsize->stepwise.min_width = 1;
+    fsize->stepwise.step_height = 1;
+    fsize->stepwise.step_width = 1;
+    return 0;
+}

Note: Not even build tested :)
Jonas: maybe you can add a fixup patch to your rkvdec patches as well.
Thanks, will take a closer look and include something for rkvdec high10
v6 later today/tomorrow.
Great, thanks.

To back up the "IIRC" part of my previous reply about .min_width and
.min_height: v4l2_apply_frmsize_constraints does clamp_roundup on width and
height, which means it sets height = hw_min_height and/or width =
hw_min_width if the given width/height is smaller. Thus userspace doesn't
need to care about width/height as long as it's 1 <=
hw_max_height/hw_max_width.

Alex

Regards,
Jonas

Regards,

Alex

I guess this new driver should be an
opportunity to fix that and distinguish between internal and external
frame size requirements and the .vidioc_enum_framesizes callback should
adapted accordingly. Regards, Alex
Detlev.




[Index of Archives]     [Linux Driver Development]     [Linux Driver Backports]     [DMA Engine]     [Linux GPIO]     [Linux SPI]     [Video for Linux]     [Linux USB Devel]     [Linux Coverity]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux