Hi Paul, Le jeudi 19 octobre 2023 à 11:39 +0200, Paul Kocialkowski a écrit : > Hello, > > While working on the Allwinner Video Engine H.264 encoder, I found that it has > some pre-processing capabilities. This includes things like chroma > down-sampling, colorspace conversion and scaling. Similar with Hantro H1. > > For example this means that you can feed the encoder with YUV 4:2:2 data and > it will downsample it to 4:2:0 since that's the only thing the hardware can do. > It can also happen when e.g. providing RGB source pictures which will be > converted to YUV 4:2:0 internally. > > I was wondering how all of this is dealt with currently and whether this should > be a topic of attention. As far as I can see there is currently no practical way > for userspace to know that such downsampling will take place, although this is > useful to know. Userspace already know that the driver will be downsample through the selected profile. The only issue would be if a users want to force a profile with 422 support, but have its 422 data downsampled anyway. This is legal in the spec, but I'd question myself if its worth supporting. > > Would it make sense to have an additional media entity between the source video > node and the encoder proc and have the actual pixel format configured in that > link (this would still be a video-centric device so userspace would not be > expected to configure that link). But then what if the hardware can either > down-sample or keep the provided sub-sampling? How would userspace indicate > which behavior to select? It is maybe not great to let userspace configure the > pads when this is a video-node-centric driver. > > Perhaps this could be a control or the driver could decide to pick the least > destructive sub-sampling available based on the selected codec profile > (but this is still a guess that may not match the use case). With a control > we probably don't need an extra media entity. Yes, for the cases not covered by the profile, I'd consider a control to force downsampling. A menu, so we can use the available menu items to get enumerate what is supported. > > Another topic is scaling. We can generally support scaling by allowing a > different size for the coded queue after configuring the picture queue. > However there would be some interaction with the selection rectangle, which is > used to set the cropping rectangle from the *source*. So the driver will need > to take this rectangle and scale it to match with the coded size. > > The main inconsistency here is that the rectangle would no longer correspond to > what will be set in the bitstream, nor would the destination size since it does > not count the cropping rectangle by definition. It might be more sensible to > have the selection rectangle operate on the coded/destination queue instead, > but things are already specified to be the other way round. > > Maybe a selection rectangle could be introduced for the coded queue too, which > would generally be propagated from the picture-side one, except in the case of > scaling where it would be used to clarify the actual final size (coded size > taking the cropping in account). It this case the source selection rectangle > would be understood as an actual source crop (may not be supported by hardware) > instead of an indication for the codec metadata crop fields. And the coded > queue dimensions would need to take in account this source cropping, which is > kinda contradictory with the current semantics. Perhaps we could define that > the source crop rectangle should be entirely ignored when scaling is used, > which would simplify things (although we lose the ability to support source > cropping if the hardware can do it). Yes, we should use selection on both queue (fortunately there is a v4l2_buf_type in that API). Otherwise we cannot model all the scaling and cropping options. What the spec must do is define the configuration sequence, so that a negotiation is possible. We need a convention regarding the order, so that there is a way to converge with the driver, and also to conclude if the driver cannot handle it. > > If operating on the source selection rectangle only (no second rectangle on the > coded queue) some cases would be impossible to reach, for instance going from > some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to > 1920x1088 and we want the codec cropping fields to indicate 1920x1080). > > Anyway just wanted to check if people have already thought about these topics, > but I'm mostly thinking out loud and I'm of course not saying we need to solve > these problems now. We might find extra corner case by implementing the spec, but I think the API we have makes most of this possible already. Remember that we have fwht sw codec in kernel for the purpose of developing this kind of feature. A simple bob scaler can be added for testing scaling. > > Sorry again for the long email, I hope the points I'm making are somewhat > understandable. > > Cheers, > > Paul > regards, Nicolas