Hello, While working on the Allwinner Video Engine H.264 encoder, I found that it has some pre-processing capabilities. This includes things like chroma down-sampling, colorspace conversion and scaling. For example this means that you can feed the encoder with YUV 4:2:2 data and it will downsample it to 4:2:0 since that's the only thing the hardware can do. It can also happen when e.g. providing RGB source pictures which will be converted to YUV 4:2:0 internally. I was wondering how all of this is dealt with currently and whether this should be a topic of attention. As far as I can see there is currently no practical way for userspace to know that such downsampling will take place, although this is useful to know. Would it make sense to have an additional media entity between the source video node and the encoder proc and have the actual pixel format configured in that link (this would still be a video-centric device so userspace would not be expected to configure that link). But then what if the hardware can either down-sample or keep the provided sub-sampling? How would userspace indicate which behavior to select? It is maybe not great to let userspace configure the pads when this is a video-node-centric driver. Perhaps this could be a control or the driver could decide to pick the least destructive sub-sampling available based on the selected codec profile (but this is still a guess that may not match the use case). With a control we probably don't need an extra media entity. Another topic is scaling. We can generally support scaling by allowing a different size for the coded queue after configuring the picture queue. However there would be some interaction with the selection rectangle, which is used to set the cropping rectangle from the *source*. So the driver will need to take this rectangle and scale it to match with the coded size. The main inconsistency here is that the rectangle would no longer correspond to what will be set in the bitstream, nor would the destination size since it does not count the cropping rectangle by definition. It might be more sensible to have the selection rectangle operate on the coded/destination queue instead, but things are already specified to be the other way round. Maybe a selection rectangle could be introduced for the coded queue too, which would generally be propagated from the picture-side one, except in the case of scaling where it would be used to clarify the actual final size (coded size taking the cropping in account). It this case the source selection rectangle would be understood as an actual source crop (may not be supported by hardware) instead of an indication for the codec metadata crop fields. And the coded queue dimensions would need to take in account this source cropping, which is kinda contradictory with the current semantics. Perhaps we could define that the source crop rectangle should be entirely ignored when scaling is used, which would simplify things (although we lose the ability to support source cropping if the hardware can do it). If operating on the source selection rectangle only (no second rectangle on the coded queue) some cases would be impossible to reach, for instance going from some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to 1920x1088 and we want the codec cropping fields to indicate 1920x1080). Anyway just wanted to check if people have already thought about these topics, but I'm mostly thinking out loud and I'm of course not saying we need to solve these problems now. Sorry again for the long email, I hope the points I'm making are somewhat understandable. Cheers, Paul -- Paul Kocialkowski, Bootlin Embedded Linux and kernel engineering https://bootlin.com
Attachment:
signature.asc
Description: PGP signature