Re: [PATCH v2 2/2] media: imx: vdic: Introduce mem2mem VDI deinterlacer driver

Philipp Zabel <p.zabel@xxxxxxxxxxxxxx> · Thu, 26 Sep 2024 13:14:16 +0200

Hi,

On Mi, 2024-09-25 at 22:14 +0200, Marek Vasut wrote:
> The userspace could distribute the frames between the two devices in an 
> alternating manner, can it not ?

This doesn't help with latency, or when converting a single large
frame.

For the deinterlacer, this can't be done with the motion-aware
temporally filtering modes. Those need a field from the previous frame.

> 
> Would the 1280x360 field be split into two tiles vertically and each 
> tile (newly 1280/2 x 360) be enqueued on each VDIC ? I don't think that 
> works, because you wouldn't be able to stitch those tiles back together 
> nicely after the deinterlacing, would you? I would expect to see some 
> sort of an artifact exactly where the two tiles got stitched back 
> together, because the VDICs are unaware of each other and how each 
> deinterlaced the tile.

I was thinking horizontally, two 640x720 tiles side by side. 1280 is
larger than the 968 pixel maximum horizontal resolution of the VDIC.

As you say, splitting vertically (which would be required for 1080i)
should cause artifacts at the seam due to the 4-tap vertical filter.

[...]
> > 
> > With the rigid V4L2 model though, where memory handling, parameter
> > calculation, and job scheduling of tiles in a single frame all have to
> > be hidden behind the V4L2 API, I don't think requiring userspace to
> > combine multiple mem2mem video devices to work together on a single
> > frame is feasible.
> 
> If your concern is throughput (from what I gathered from the text 
> above), userspace could schedule frames on either VDIC in alternating 
> manner.

Both throughput and latency.

Yes, alternating to different devices would help with throughput where
possible, but it's worse for frame pacing, a hassle to implement
generically in userspace, and it's straight up impossible with temporal
filtering.

> I think this is much better and actually generic approach than trying to 
> combine two independent devices on kernel level and introduce some sort 
> of scheduler into kernel driver to distribute jobs between the two 
> devices. Generic, because this approach works even if either of the two 
> devices is not VDIC. Independent devices, because yes, the MX6Q IPUs are 
> two independent blocks, it is only the current design of the IPUv3 
> driver that makes them look kind-of like they are one single big device, 
> I am not happy about that design, but rewriting the IPUv3 driver is way 
> out of scope here. (*)

The IPUs are glued together at the capture and output paths, so yes,
they are independent blocks, but also work together as a big device.

> > Is limiting different users to the different deinterlacer hardware
> > units a real usecase? I saw the two ICs, when used as mem2mem devices,
> > as interchangeable resources.
> 
> I do not have that use case, but I can imagine it could come up.
> In my case, I schedule different cameras to different VDICs from 
> userspace as needed.

Is this just because a single VDIC does not have enough throughput to
serve all cameras, or is there some reason for a fixed assignment
between cameras and VDICs?

> > > > To be fair, we never implemented that for the CSC/scaler mem2mem device
> > > > either.
> > > 
> > > I don't think that is actually a good idea. Instead, it would be better
> > > to have two scaler nodes in userspace.
> > 
> > See above, that would make it impossible (or rather unreasonably
> > complicated) to distribute work on a single frame to both IPUs.
> 
> Is your concern latency instead of throughput ? See my comment in 
> paragraph (*) .

Either, depending on the use case.

[...]
> > > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207
> > 
> > That code is unused. The direct hardware path doesn't use
> > IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half
> > of the incoming fields are dropped. The setup is vdic_setup_direct().
> 
> All right, let's drop that unused code then, I'll prepare a patch.

Thanks!

> But it seems the bottom line is, the VDI direct mode does not act as a 
> frame-rate doubler ?

Yes, it can't. In direct mode, VDIC only receives half of the fields.

[...]
> > > 
> Why would adding the (configurable) frame-rate doubling mode break 
> userspace if this is not the default ?

I'm not sure it would. Maybe there should be a deinterlacer control to
choose between full and half field rate output (aka frame doubling and
1:1 input to output frame rate).

Also, my initial assumption was that currently there is 1:1 input
frames to output frames. But with temporal filtering enabled there's
already one input frame (the first one) that doesn't produce any
output.

> > > > If we don't start with that supported, I fear userspace will make
> > > > assumptions and be surprised when a full rate mode is added later.
> > > 
> > > I'm afraid that since the current VDI already does retain input frame
> > > rate instead of doubling it, the userspace already makes an assumption,
> > > so that ship has sailed.
> > 
> > No, this is about the deinterlacer mem2mem device, which doesn't exist
> > before this series.
> 
> I am not convinced it is OK if the direct VDI path and mem2mem VDI 
> behave differently, that would be surprising to me as a user ?

Is this still about the frame rate doubling? Surely supporting it in
the mem2mem device and not in the capture path is ok. I'm not arguing
that frame doubling should be enabled by default.

> > The CSI capture path already has configurable framedrops (in the CSI).
> 
> What am I looking for ? git grep doesn't give me any hits ? (**)

That's configured by the set_frame_interval pad op of the CSI subdevice
- on the IDMAC output pad. See csi_find_best_skip().

> > > But I think we can make the frame doubling configurable ?
> > 
> > That would be good. Specifically, there must be no guarantee that one
> > input frame with two fields only produces one deinterlaced output
> > frame, and userspace should somehow be able to understand this.
> 
> See my question (**) , where is this configurable framedrops thing ?

This would have to be done differently, though. Here we don't have
subdev set_frame_interval configuration, and while VIDIOC_S_PARM /
v4l2_captureparm were used to configure frame dropping on capture
devices, that's not really applicable to mem2mem deinterlacers.

> > I'd rather not default to the setting that throws away half of the
> > input data. Not using frame doubling by default is sensible, but now
> > that using all three input fields to calculate the output frame is
> > possible, why not make that the default.
>
> To save memory bandwidth on the MX6, that's my main concern.

What userspace are you using to exercise this driver? Maybe we can back
this concern with a few numbers (or mine with pictures).

regards
Philipp