Hi Alex, On 2024-06-26 11:12, Alex Bee wrote: > Hi Detlev, > > Am 25.06.24 um 18:56 schrieb Detlev Casanova: >> Hi Alex, >> >> On Sunday, June 23, 2024 5:33:28 A.M. EDT you wrote: >>> Hi Detlev, >>> >>> Am 20.06.24 um 16:19 schrieb Detlev Casanova: >>>> This driver supports the second generation of the Rockchip Video >>>> decoder, also known as vdpu34x. >>>> It is currently only used on the RK3588(s) SoC. >>>> >>>> There are 2 decoders on the RK3588 SoC that can work in pair to decode >>>> 8K video at 30 FPS but currently, only using one core at a time is >>>> supported. >>>> >>>> Scheduling requests between the two cores will be implemented later. >>>> >>>> The core supports H264, HEVC, VP9 and AVS2 decoding but this driver >>>> currently only supports H264. >>>> >>>> The driver is based on rkvdec and they may share some code in the >>>> future. >>>> The decision to make a different driver is mainly because rkvdec2 has >>>> more features and can work with multiple cores. >>>> >>>> The registers are mapped in a struct in RAM using bitfields. It is IO >>>> copied to the HW when all values are configured. >>>> The decision to use such a struct instead of writing buffers one by one >>>> >>>> is based on the following reasons: >>>> - Rockchip cores are known to misbehave when registers are not written >>>> >>>> in address order, >>>> >>>> - Those cores also need the software to write all registers, even if >>>> >>>> they are written their default values or are not related to the task >>>> (this core will not start decoding some H264 frames if some VP9 >>>> registers are not written to 0) >>>> >>>> - In the future, to support multiple cores, the scheduler could be >>>> >>>> optimized by storing the precomputed registers values and copy them >>>> to the HW as soos as a core becomes available. >>>> >>>> This makes the code more readable and may bring performance improvements >>>> in future features. >>>> >>>> Signed-off-by: Detlev Casanova <detlev.casanova@xxxxxxxxxxxxx> >>>> --- >>>> >>>> drivers/staging/media/Kconfig | 1 + >>>> drivers/staging/media/Makefile | 1 + >>>> drivers/staging/media/rkvdec2/Kconfig | 15 + >>>> drivers/staging/media/rkvdec2/Makefile | 3 + >>>> drivers/staging/media/rkvdec2/TODO | 9 + >>>> drivers/staging/media/rkvdec2/rkvdec2-h264.c | 739 +++++++++++ >>>> drivers/staging/media/rkvdec2/rkvdec2-regs.h | 345 +++++ >>>> drivers/staging/media/rkvdec2/rkvdec2.c | 1253 ++++++++++++++++++ >>>> drivers/staging/media/rkvdec2/rkvdec2.h | 130 ++ >>>> 9 files changed, 2496 insertions(+) >>>> create mode 100644 drivers/staging/media/rkvdec2/Kconfig >>>> create mode 100644 drivers/staging/media/rkvdec2/Makefile >>>> create mode 100644 drivers/staging/media/rkvdec2/TODO >>>> create mode 100644 drivers/staging/media/rkvdec2/rkvdec2-h264.c >>>> create mode 100644 drivers/staging/media/rkvdec2/rkvdec2-regs.h >>>> create mode 100644 drivers/staging/media/rkvdec2/rkvdec2.c >>>> create mode 100644 drivers/staging/media/rkvdec2/rkvdec2.h >>> ... >>> >>>> +static inline void rkvdec2_memcpy_toio(void __iomem *dst, void *src, >>>> size_t len) +{ >>>> +#ifdef CONFIG_ARM64 >>>> + __iowrite32_copy(dst, src, len); >>>> +#elif defined(CONFIG_ARM) >>> I guess that can get an "#else" since memcpy_toio exists for all archs. >>> >>>> + memcpy_toio(dst, src, len); >>>> +#endif >>>> +} >>>> + >>> ... >>> >>>> + /* Set timeout threshold */ >>>> + if (pixels < RKVDEC2_1080P_PIXELS) >>>> + regs->common.timeout_threshold = RKVDEC2_TIMEOUT_1080p; >>>> + else if (pixels < RKVDEC2_4K_PIXELS) >>>> + regs->common.timeout_threshold = RKVDEC2_TIMEOUT_4K; >>>> + else if (pixels < RKVDEC2_8K_PIXELS) >>>> + regs->common.timeout_threshold = RKVDEC2_TIMEOUT_8K; >>>> + >>> Did you test if it works with anything > 8K? If so, you propably want to >>> make the check above >>> >>> + else >>> + regs->common.timeout_threshold = RKVDEC2_TIMEOUT_8K; >>> >>> Otherwise the timeout may not be set/contain invalid values from any former >>> stream. >> That's right, but it would be set to 0 because of the memset. >> RKVDEC2_TIMEOUT_8K might not be enough for bigger frame sizes, so I'll set it >> to the maximum value (0xffffffff) when frames are bigger than 8K and also adapt >> the watchdog time: RKVDEC2_TIMEOUT_8K is around 100 ms, but 0xffffffff is arnoud >> 5.3 seconds (reg032/axi_clock_freq) >> >> I'll do more tests with this as well. >> >>> ... >>> >>>> + >>>> +static const struct rkvdec2_coded_fmt_desc rkvdec2_coded_fmts[] = { >>>> + { >>>> + .fourcc = V4L2_PIX_FMT_H264_SLICE, >>>> + .frmsize = { >>>> + .min_width = 16, >>>> + .max_width = 65520, >>>> + .step_width = 16, >>>> + .min_height = 16, >>>> + .max_height = 65520, >>>> + .step_height = 16, >>>> + }, >>>> + .ctrls = &rkvdec2_h264_ctrls, >>>> + .ops = &rkvdec2_h264_fmt_ops, >>>> + .num_decoded_fmts = >> ARRAY_SIZE(rkvdec2_h264_decoded_fmts), >>>> + .decoded_fmts = rkvdec2_h264_decoded_fmts, >>>> + .subsystem_flags = >> VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF, >>>> + }, >>>> +}; >>>> + >>> Note, that this is also given to userspace (VIDIOC_ENUM_FRAMESIZES) and >>> this is already incorrect in the old rkvdec driver (and hantro): From >>> userspace perspective we do not have a restriction in >>> step_width/step_width, as we are aligning any given width/height to HW >>> requirements in the driver - what we should give to userspace is >>> fsize->type = V4L2_FRMSIZE_TYPE_CONTINUOUS; fsize->stepwise.min_height = >>> 1; fsize->stepwise.min_width = 1; fsize->stepwise.max_height = 65520; >>> fsize->stepwise.max_width = 65520; >> Is fsize->stepwise.min_height = 1; and fsize->stepwise.min_width = 1 correct ? >> Or do you mean fsize->stepwise.step_height = 1; and fsize->stepwise.setp_width >> = 1 ? >> >> It would give this instead: >> >> .frmsize = { >> .min_width = 16, >> .max_width = 65520, >> .step_width = 1, >> .min_height = 16, >> .max_height = 65520, >> .step_height = 1, >> }, >> >> and .vidioc_enum_framesizes sets fsize->type = V4L2_FRMSIZE_TYPE_CONTINUOUS; > You can't adapt this here, because this .frmsize is also given to the > v4l2_apply_frmsize_constraints helper, which does the actual alignment to > HW requirements and requires the HW step_with and step_height. > IIRC, we also align framesizes which are below minimum HW requirement, at > least in rkvdec1 driver and it looks a lot like this is done here the same: > so this should be .min_height = 1 and .min_width = 1. (I remember because > there are VP9 conformance tests with very small framesizes). And yes, it > looks like you've had to set .step_width and .step_height to 1 for > V4L2_FRMSIZE_TYPE_CONTINUOUS, not sure why that is required. > > So, imho, the final rkvdec2_enum_framesizes should look like > > +static int rkvdec2_enum_framesizes(struct file *file, void *priv, > + struct v4l2_frmsizeenum *fsize) > .... > + fmt = rkvdec2_find_coded_fmt_desc(fsize->pixel_format); > + if (!fmt) > + return -EINVAL; > + > + fsize->type = V4L2_FRMSIZE_TYPE_CONTINUOUS; > + fsize->stepwise.min_height = 1; > + fsize->stepwise.max_height = fmt->frmsize.max_height; > + fsize->stepwise.min_width = 1; > + fsize->stepwise.max_width = fmt->frmsize.max_width; > + fsize->stepwise.min_width = 1; > + fsize->stepwise.step_height = 1; > + fsize->stepwise.step_width = 1; > + return 0; > +} > > Note: Not even build tested :) > Jonas: maybe you can add a fixup patch to your rkvdec patches as well. Thanks, will take a closer look and include something for rkvdec high10 v6 later today/tomorrow. Regards, Jonas > > Regards, > > Alex > >>> I guess this new driver should be an >>> opportunity to fix that and distinguish between internal and external >>> frame size requirements and the .vidioc_enum_framesizes callback should >>> adapted accordingly. Regards, Alex >> Detlev.