Re: [PATCH v2 0/2] Enable JPEG encoding on rk3588

Link Mauve <linkmauve@xxxxxxxxxxxx> · Fri, 12 Apr 2024 17:07:19 +0200

On Sun, Apr 07, 2024 at 10:08:58AM +0200, Nicolas Dufresne wrote:
> Le vendredi 05 avril 2024 à 16:21 +0200, Link Mauve a écrit :
> > On Thu, Apr 04, 2024 at 01:41:15PM -0400, Nicolas Dufresne wrote:
> > > Hi,
> > 
> > Hi,
> > 
> > > 
> > > Le mercredi 27 mars 2024 à 14:41 +0100, Emmanuel Gil Peyrot a écrit :
> > > > Only the JPEG encoder is available for now, although there are patches
> > > > for the undocumented VP8 encoder floating around[0].
> > > 
> > > [0] seems like a broken link. The VP8 encoder RFC is for RK3399 (and Hantro H1
> > > posted by ST more recently). The TRM says "VEPU121(JPEG encoder only)", which
> > > suggest that the H.264 and VP8 encoders usually found on the VEPU121 are
> > > removed. As Rockchip have remove the synthesize register while modifying the H1
> > > IP, it is difficult to verify. Confusingly the H.264 specific registers are
> > > documented in the TRM around VEPU121.
> > 
> > Ah, the link became, and was indeed ST’s series:
> > https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885&archive=both
> > 
> > But the TRM part 1 says the VEPU121 supports H.264 encoding (page 367),
> > and it’s likely they didn’t remove just VP8 support since the codec
> > features are pretty close to H.264’s.
> > 
> > > 
> > > > 
> > > > This has been tested on a rock-5b, resulting in four /dev/video*
> > > > encoders.  The userspace program I’ve been using to test them is
> > > > Onix[1], using the jpeg-encoder example, it will pick one of these four
> > > > at random (but displays the one it picked):
> > > > % ffmpeg -i <input image> -pix_fmt yuvj420p temp.yuv
> > > > % jpeg-encoder temp.yuv <width> <height> NV12 <quality> output.jpeg
> > > 
> > > I don't like that we exposing each identical cores a separate video nodes. I
> > > think we should aim for 1 device, and then multi-plex and schedule de cores from
> > > inside the Linux kernel.
> > 
> > I agree, but this should be handled in the driver not in the device
> > tree, and it can be done later.
> 
> As the behaviour we want is that these cores becomes a group and get schedule
> together, its certainly a good time to slow down and evaluate if that part needs
> to be improve in the DT too.
> 
> Hantro G1/H1 and VEPU/VDPU121 combos originally shared the same sram region. Its
> not clear if any of these cores have this limitation and if this should be
> expressed in the DT / driver.

The TRM on page 369 mentions that:
> Please note that VDPU121 and VDPU381 and VDPU720 and AV1 and VEPU121 and VEPU580
> is different IP cores, so these processing core can be work together.

I understand that as them not sharing any memory and being able to work
independently.

> 
> > 
> > > 
> > > Not doing this now means we'll never have an optimal hardware usage
> > > distribution. Just consider two userspace software wanting to do jpeg encoding.
> > > If they both take a guess, they may endup using a single core. Where with proper
> > > scheduling in V4L2, the kernel will be able to properly distribute the load. I
> > > insist on this, since if we merge you changes it becomes an ABI and we can't
> > > change it anymore.
> > 
> > Will it really become ABI just like that?  Userspace should always
> > discover the video nodes and their capabilities and not hardcode e.g. a
> > specific /dev/videoN file for a specific codec.  I would argue that this
> > series would let userspace do JPEG encoding right away, even if in a
> > less optimal way than if the driver would round-robin them through a
> > single video node, but that can always be added in a future version.
> 
> Might be on the gray side, but there is good chances software written for your
> specific board can stop working after te grouping is done.

I will send a new series shortly which enables only one of these four
cores, the functionality should be the same, it will just expose only
one video node which can get a four times throughput upgrade later once
we implement multi-core support in the driver.

> 
> > 
> > > 
> > > I understand that this impose a rework of the mem2mem framework so that we can
> > > run multiple jobs, but this will be needed anyway on RK3588, since the rkvdec2,
> > > which we don't have a driver yet is also multi-core, but you need to use 2 cores
> > > when the resolution is close to 8K.
> > 
> > I think the mediatek JPEG driver already supports that, would it be ok
> > to do it the same way?
> 
> I don't know for JPEG, the MTK vcoder do support cascading cores. This is
> different from concurrent cores. In MTK architecture, for some of the codec,
> there is LAT (entropy decoder) and CORE (the reconstruction block) that are
> split.

Ah, that’s different then, I only had a cursory look at them back when I
implemented the JPEG decoder for sunxi.

> 
> Nicolas

-- 
Link Mauve