On Mon, Oct 22, 2018 at 3:51 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > > On Mon, Oct 22, 2018 at 3:39 PM Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote: > > > > On Mon, Oct 22, 2018 at 3:22 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > > > > > > On Mon, Oct 22, 2018 at 3:05 PM Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote: > > > > > > > > On Fri, Oct 19, 2018 at 5:44 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote: > > > > > > > > > > On 10/19/18 10:09, Alexandre Courbot wrote: > > > > > > Thanks everyone for the feedback on v2! I have not replied to all the > > > > > > individual emails but hope this v3 will address some of the problems > > > > > > raised and become a continuation point for the topics still in > > > > > > discussion (probably during the ELCE Media Summit). > > > > > > > > > > > > This patch documents the protocol that user-space should follow when > > > > > > communicating with stateless video decoders. It is based on the > > > > > > following references: > > > > > > > > > > > > * The current protocol used by Chromium (converted from config store to > > > > > > request API) > > > > > > > > > > > > * The submitted Cedrus VPU driver > > > > > > > > > > > > As such, some things may not be entirely consistent with the current > > > > > > state of drivers, so it would be great if all stakeholders could point > > > > > > out these inconsistencies. :) > > > > > > > > > > > > This patch is supposed to be applied on top of the Request API V18 as > > > > > > well as the memory-to-memory video decoder interface series by Tomasz > > > > > > Figa. > > > > > > > > > > > > Changes since v2: > > > > > > > > > > > > * Specify that the frame header controls should be set prior to > > > > > > enumerating the CAPTURE queue, instead of the profile which as Paul > > > > > > and Tomasz pointed out is not enough to know which raw formats will be > > > > > > usable. > > > > > > * Change V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM to > > > > > > V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS. > > > > > > * Various rewording and rephrasing > > > > > > > > > > > > Two points being currently discussed have not been changed in this > > > > > > revision due to lack of better idea. Of course this is open to change: > > > > > > > > > > > > * The restriction of having to send full frames for each input buffer is > > > > > > kept as-is. As Hans pointed, we currently have a hard limit of 32 > > > > > > buffers per queue, and it may be non-trivial to lift. Also some codecs > > > > > > (at least Venus AFAIK) do have this restriction in hardware, so unless > > > > > > we want to do some buffer-rearranging in-kernel, it is probably better > > > > > > to keep the default behavior as-is. Finally, relaxing the rule should > > > > > > be easy enough if we add one extra control to query whether the > > > > > > hardware can work with slice units, as opposed to frame units. > > > > > > > > > > Makes sense, as long as the restriction can be lifted in the future. > > > > > > > > Lifting this limitation once we support more than 32 buffers should > > > > not be an issue. Just add a new capability control and process things > > > > in slice units. Right now we have hardware that can only work with > > > > whole frames (venus) > > > > > > Note that venus is a stateful hardware and the restriction might just > > > come from the firmware. > > > > Right, and it most certainly does indeed. Yet firmwares are not always > > easy to get updated by vendors, so we may have to deal with it anyway. > > > > Right. I'm just not convinced that venus is relevant to the stateless interface. > > I'd use the Rockchip VPU hardware as an example of a hardware that > seems to require all the slices in one buffer, tightly packed one > after another, since it only accepts one address and size into its > registers. You're right that Venus is not relevant here. Rockchip VPU is a much better example. > > > > > > > > but I suspect that some slice-only hardware must > > > > exist, so it may actually become a necessity at some point (lest > > > > drivers do some splitting themselves). > > > > > > > > > > The drivers could do it trivially, because the UAPI will include the > > > array of slices, with offsets and sizes. It would just run the same > > > OUTPUT buffer multiple time, once for each slice. > > > > Alignment issues notwithstanding. :) > > Good point. That could be handled by a memcpy() into a bounce buffer, > but it wouldn't be as trivial as I suggested anymore indeed. But at least the kernel won't have to try and parse the stream since the slices' limits are clearly defined by user-space, so it's not that big a deal.