On 2019-06-04 11:38, Boris Brezillon wrote: > On Tue, 4 Jun 2019 09:15:28 +0000 > Jonas Karlman <jonas@xxxxxxxxx> wrote: > >> On 2019-06-04 11:06, Thierry Reding wrote: >>> On Tue, Jun 04, 2019 at 10:49:21AM +0200, Boris Brezillon wrote: >>>> On Tue, 4 Jun 2019 10:31:57 +0200 >>>> Thierry Reding <thierry.reding@xxxxxxxxx> wrote: >>>> >>>>>>>>> - Using flags >>>>>>>>> >>>>>>>>> The current MPEG-2 controls have lots of u8 values that can be >>>>>>>>> represented as flags. Using flags also helps with padding. >>>>>>>>> It's unlikely that we'll get more than 64 flags, so using a u64 by >>>>>>>>> default for that sounds fine (we definitely do want to keep some room >>>>>>>>> available and I don't think using 32 bits as a default is good enough). >>>>>>>>> >>>>>>>>> I think H.264/HEVC per-control flags should also be moved to u64. >>>>>>>> There was also some concensus on this, that u64 should be good enough >>>>>>>> for anything out there, though we obviously don't know what the future >>>>>>>> will hold, so perhaps adding some way for possible extending this in the >>>>>>>> future might be good. I guess we'll get new controls for new codecs >>>>>>>> anyway, so we can punt on this until then. >>>>>>>> >>>>>>>>> - Clear split of controls and terminology >>>>>>>>> >>>>>>>>> Some codecs have explicit NAL units that are good fits to match as >>>>>>>>> controls: e.g. slice header, pps, sps. I think we should stick to the >>>>>>>>> bitstream element names for those. >>>>>>>>> >>>>>>>>> For H.264, that would suggest the following changes: >>>>>>>>> - renaming v4l2_ctrl_h264_decode_param to v4l2_ctrl_h264_slice_header; >>>>>>>>> - killing v4l2_ctrl_h264_decode_param and having the reference lists >>>>>>>>> where they belong, which seems to be slice_header; >>>>>>> But now here it's being described per slice. When I look at the slice >>>>>>> header, I only see list of modifications and when I look at userspace, >>>>>>> That list is simply built from DPB, the modifications list found in the >>>>>>> slice header seems to be only used to craft the l0/l1 list. >>>>>> Yes, I think there was a misunderstanding which was then clarified >>>>>> (unfortunately it happened on IRC, so we don't have a trace of this >>>>>> discussion). The reference list should definitely be per-frame, and the >>>>>> L0/L1 slice reflists are referring to the per-frame reference list (it's >>>>>> just a sub-set of the per-frame reflist re-ordered differently). >>>>>> >>>>>>> There is one thing that come up though, if we enable per-frame decoding >>>>>>> on top of per-slice decoder (like Cedrus), won't we force userspace to >>>>>>> always compute l0/l1 even though the HW might be handling that ? >>>>>> That's true, the question is, what's the cost of this extra re-ordering? >>>>> I think ultimately userspace is already forced to compute these lists >>>>> even if some hardware may be able to do it in hardware. There's going to >>>>> be other hardware that userspace wants to support that can't do it by >>>>> itself, so userspace has to at least have the code anyway. What it could >>>>> do on top of that decide not to run that code if it somehow detects that >>>>> hardware can do it already. On the other hand this means that we have to >>>>> expose a whole lot of capabilities to userspace and userspace has to go >>>>> and detect all of them in order to parameterize all of the code. >>>>> >>>>> Ultimately I suspect many applications will just choose to pass the data >>>>> all the time out of simplicity. I mean drivers that don't need it will >>>>> already ignore it (i.e. they must not break if they get the extra data) >>>>> so other than the potential runtime savings on some hardware, there are >>>>> no advantages. >>>>> >>>>> Given that other APIs don't bother exposing this level of control to >>>>> applications makes me think that it's just not worth it from a >>>>> performance point of view. >>>> That's not exactly what Nicolas proposed. He was suggesting that we >>>> build those reflists kernel-side: V4L would provide an helper and >>>> drivers that need those lists would use it, others won't. This way we >>>> have no useless computation done, and userspace doesn't even have to >>>> bother checking the device caps to avoid this extra step. >>> Oh yeah, that sounds much better. I suppose one notable differences to >>> other APIs is that they have to pass in buffers for all the frames in >>> the DPB, so they basically have to build the lists in userspace. Since >>> we'll end up looking up the frames in the kernel, it sounds reasonable >>> to also build the lists in the kernel. >> Userspace must already process the modification list or it wont have correct DPB for next frame. > Can you point us to the code or the section in the spec that > mentions/proves this dependency? I might have missed something, but my > understanding was that the slice ref lists (or the list of > modifications to apply to the list of long/short refs attached to a > frame) had no impact on the list of long/short refs attached to the > following frame. I must have mixed up the marking process with the modification process. You seem to be correct, the modification process should not affect the short/long-term reference marking if I understand the spec and code correctly. Regards, Jonas