Re: V4L2 Encoders Pre-Processing Support Questions

Paul Kocialkowski <paul.kocialkowski@xxxxxxxxxxx> · Wed, 25 Oct 2023 11:02:43 +0200

Hi Nicolas,

Thanks for you useful answer!

On Fri 20 Oct 23, 13:56, Nicolas Dufresne wrote:
> > For example this means that you can feed the encoder with YUV 4:2:2 data and
> > it will downsample it to 4:2:0 since that's the only thing the hardware can do.
> > It can also happen when e.g. providing RGB source pictures which will be
> > converted to YUV 4:2:0 internally.
> > 
> > I was wondering how all of this is dealt with currently and whether this should
> > be a topic of attention. As far as I can see there is currently no practical way
> > for userspace to know that such downsampling will take place, although this is
> > useful to know.
> 
> Userspace already know that the driver will be downsample through the selected
> profile. The only issue would be if a users want to force a profile with 422
> support, but have its  422 data downsampled anyway. This is legal in the spec,
> but I'd question myself if its worth supporting.

Yeah indeed I think there's a distinction between selecting a profile that
allows 422 and ensuring that this is what the encoder selects. Not sure if 420
is always valid for any profile, but there's surely some overlap where both
could be selected in compliance with the profile.

> > Would it make sense to have an additional media entity between the source video
> > node and the encoder proc and have the actual pixel format configured in that
> > link (this would still be a video-centric device so userspace would not be
> > expected to configure that link). But then what if the hardware can either
> > down-sample or keep the provided sub-sampling? How would userspace indicate
> > which behavior to select? It is maybe not great to let userspace configure the
> > pads when this is a video-node-centric driver.
> > 
> > Perhaps this could be a control or the driver could decide to pick the least
> > destructive sub-sampling available based on the selected codec profile
> > (but this is still a guess that may not match the use case). With a control
> > we probably don't need an extra media entity.
> 
> Yes, for the cases not covered by the profile, I'd consider a control to force
> downsampling. A menu, so we can use the available menu items to get enumerate
> what is supported.

Sounds good then.

> > Another topic is scaling. We can generally support scaling by allowing a
> > different size for the coded queue after configuring the picture queue.
> > However there would be some interaction with the selection rectangle, which is
> > used to set the cropping rectangle from the *source*. So the driver will need
> > to take this rectangle and scale it to match with the coded size.
> > 
> > The main inconsistency here is that the rectangle would no longer correspond to
> > what will be set in the bitstream, nor would the destination size since it does
> > not count the cropping rectangle by definition. It might be more sensible to
> > have the selection rectangle operate on the coded/destination queue instead,
> > but things are already specified to be the other way round.
> > 
> > Maybe a selection rectangle could be introduced for the coded queue too, which
> > would generally be propagated from the picture-side one, except in the case of
> > scaling where it would be used to clarify the actual final size (coded size
> > taking the cropping in account). It this case the source selection rectangle
> > would be understood as an actual source crop (may not be supported by hardware)
> > instead of an indication for the codec metadata crop fields. And the coded
> > queue dimensions would need to take in account this source cropping, which is
> > kinda contradictory with the current semantics. Perhaps we could define that
> > the source crop rectangle should be entirely ignored when scaling is used,
> > which would simplify things (although we lose the ability to support source
> > cropping if the hardware can do it).
> 
> Yes, we should use selection on both queue (fortunately there is a v4l2_buf_type
> in that API). Otherwise we cannot model all the scaling and cropping options.
> What the spec must do is define the configuration sequence, so that a
> negotiation is possible. We need a convention regarding the order, so that there
> is a way to converge with the driver, and also to conclude if the driver cannot
> handle it.

Agreed. I'm just a bit worried that it's a bit late to change the semantics now
that the source crop is defined in the stateful encoding uAPI and its meaning
would become unclear/different when a destination crop is added.

Cheers,

Paul

> > If operating on the source selection rectangle only (no second rectangle on the
> > coded queue) some cases would be impossible to reach, for instance going from
> > some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to
> > 1920x1088 and we want the codec cropping fields to indicate 1920x1080).
> > 
> > Anyway just wanted to check if people have already thought about these topics,
> > but I'm mostly thinking out loud and I'm of course not saying we need to solve
> > these problems now.
> 
> We might find extra corner case by implementing the spec, but I think the API we
> have makes most of this possible already. Remember that we have fwht sw codec in
> kernel for the purpose of developing this kind of feature. A simple bob scaler
> can be added for testing scaling.
> 
> > 
> > Sorry again for the long email, I hope the points I'm making are somewhat
> > understandable.
> > 
> > Cheers,
> > 
> > Paul
> > 
> 
> regards,
> Nicolas
> 

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
Attachment:
signature.asc

Description: PGP signature