Hi, > [...] > > In cable streaming notably, the RC job is to monitor the about of bits over a > > period of time (the window). This window is defined by the streaming hardware > > buffering capabilities. Best at this point is to start reading through HRD > > specifications, and open source rate control implementation (notably x264). > > > > I think overall, we can live with adding hints were needed, and if the gop > > information is appropriate hint, then we can just reuse the existing control. > > > Why we still care about GOP here. Hardware have no idea about GOP at > all. Although in codec likes HEVC, IDR and intra pictures's nalu header > is different, there is not different in the hardware coding > configration. NALU header is generated by the userspace usually. > > While future encoding would regard the current encoded picture as an IDR > is completed decided by the userspace. The discussion was around having basic RC algorithm in the kernel driver, possibly making use of hardware specific features without actually exposing it all to userspace. So assuming we do that: Paul's concern is that for best result, an RC algorithm could use knowledge of keyframe placement to preserve bucket space (possibly using the last keyframe size as a hint). Exposing the GOP structure in some form allow "prediction", so the adaption can lookahead future budget without introducing latency. There is an alternative, which is to require ahead of time queuing of encode requests. But this does introduce latency since the way it works in V4L2 today, we need the picture to be filled by the time we request an encode. Though, if we drop the GOP structure and favour this approach, the latency could be regain later by introducing fence base streaming. The technique would be for a video source (like a capture driver) to pass dmabuf that aren't filled yet, but have a companion fence. This would allow queuing requests ahead of time, and all we need is enough pre-allocation to accommodate the desired look ahead. Only issue is that perhaps this violates the fundamental of "short term" delivery of fences. But fences can also fail I think, in case the capture was stopped. We can certainly move forward with this as a future solution, or just don't implement future aware RC algorithm in term to avoid the huge task this involves (and possibly patents?) [...] > > > > Of course, the subject is much more relevant when there is encoders with more > > then 1 reference. But you are correct, what the commands do, is allow to change, > > add or remove any reference from the list (random modification), as long as they > > fit in the codec contraints (like the DPB size notably). This is the only way > > one can implement temporal SVC reference pattern, robust reference trees or RTP > > RPSI. Note that long term reference also exists, and are less complex then these > > commands. > > > > If we the userspace could manage the lifetime of reconstruction > buffers(assignment, reference), we don't need a command here. Sorry if I created confusion, the comments was something specific to H.264 coding. Its a compressed form for the reference lists. This information is coded in the slice header and enabled through adaptive_ref_pic_marking_mode_flag It was suggested so far to leave h264 slice headers writing to the driver. This is motivated by H264 slice header not being byte aligned in size, so the slice_data() is hard to combine. Also, some hardware actually produce the slice_header. This needs actual hardware interface analyses, cause an H.264 slice header is worth nothing if it cannot instruct the decoder how to maintain the desired reference state. I think this aspect should probably not be generalized to all CODECs, since the packing semantic can largely differ. When the codec header is indeed byte aligned, it can easily be seperate and combined by application, improve the application flexibility, reducing the kernel API complexity. > > It is just a problem of how to design another request API control > structure to select which buffers would be used for list0, list1. > > I this raises a big question, and I never checked how this worked with let's say > > VA. Shall we let the driver resolve the changes into commands (VP8 have > > something similar, while VP9 and AV1 are refresh flags, which are just trivial > > to compute). I believe I'll have to investigate this further. > > > > > > > > [...] regards, Nicolas