Hi all, I hope you all had a good return back after ELCE. I'm writing in order to follow up and gather ideas around a blocker in our design regarding stateless CODECs and the VP9 CODEC. This emails provides the context, and I'd like, for who would like to participate, to have a chat on IRC #v4l at 14:30 CET for an hour at most. Hans needs to leave at 16h. If you can't attend, feel free to reply to this thread with ideas. Context: ------- VP9 has this concept that resolution can be changed at any frames, including intra-frames. The consequence of this, is that references frames maybe not all be of the same resolution. What happens in practice is that the reference frames are scaled up, or down, to the decoding target using a fully defined scaling algorithm. In the context of Hantro (which I need to remind is likely the only VP9 HW decoder design in the world, considering you can get this design for free), this scaling is done on the fly. The references frames are passed in their original size. Our current design for state-less decoder, is that all reference are held and owned by the VB queue, and referred to with a timestamp (or cookie). The problem is that as of today, the VB queue buffers are all of the same formats (despite some looking forward attempt, like CREATE_BUFS). Boris has implemented a proof of concept with the current API limitation, but we could like to find a way forward so that we can support VP9 compliant implementation. The following are two ideas that already come up, more could be discussed tomorrow of course. 1. CREATE_BUFS/DELETE_BUFS -------------------------- I haven't checked how this is exposed in the VB2 framework, but CREATE_BUFS was created with this idea that you could extend an existing pool of buffers, with buffers of a different format. In order to complete this story, we'd need a DELETE_BUF, which allow asking VB2 to drop it's reference to a specific chunk of memory. For VP9, a resolution change would looke like this (simplified): - Userspace detect that next frame has different resolution - Then DELETE_BUF any buffers that are no longer relevant - Then TRY_FMT/CREATE_BUFS for the new resolution As decoding continues, and references frame are no longer relevant, userspace will do further DELETE_BUF calls. The STREAMON/OFF calls are no needed anymore. Pros: - It's simple to use - There is prior art in the API Const: - QUERYBUF is now insufficient, as we need the format to be returned - G_FMT becomes ambiguous - It's unclear what to do with buffer index, are they shifted ? - Userspace and kernel need to keep managing buffer index, timestamp (aka cookie) which seems quite error prone - DELETE term might be off reality, maybe RELEASE ? 2. Use a control to pass references That was an idea that came in previous discussion. We could introduce a controls to set the 3 references in VP9. Along with each reference, we could pass back the v4l2_format as it was when this reference frame was decoded. This would fully by-pass the timestamp/cookie mechanism. But would impose that VP9 only works with DMABuf, and that a flush/streamoff/re-alloc/streamon operation happen. It would work if the resolution changes are rare, e.g. not happening on consecutive frames. Pros: - Less invasive (uAPI/Spec whise) Cons: - It's very expensive - The memory mapping cache is lost, and need to be re-implemented in the driver (or some helpers need to be exposed) - Is inconsistent with H264/HEVC 3. Split buffer allocation and queue This is a bit of a crazy and unfinished idea. I'm writing it down just to feed some ideas, and with hope somebody with the right knowledge (no me) might make some sense out of it. What we could consider is to dissociate completely the queues from buffer allocation and their format. With this idea, the queues will only serve as a queue of pending operations. I believe such an allocation model would require a kernel object, exposed to user-space as an FD, that can wrap an DMABuf and stored all the relevant metadata, such as the video format for this "frame". For those familiar with DRM, you'll see where this is inspired from. The wrapper is also a good place for any caching needed when importing buffers. So this is no longer cached in the queue. This would require a whole new set of IOCTL to allocate, release (we should start thinking in term of reference count rather then create/delete). As a side effect, these self contained frames allow serializing the format changes inside a queue. In such model, the reference frames no longer need to be in the queue, as they can be passed using their wrapper. With this we basically get rid of the cookie/timestamp mechanism which most of us dislike. The workflow is mostly identical proposal 1, the difference is that reference lookup code can be removed. The driver no longer need to strictly track the buffers that has been allocated. >From the queue perspective, this would need to be a totally new type of capture/output. The v4l2_buffer would point to a frame object rather then memory pointer/dmabuf/dma-offset. Pros: - Much more flexible model - Helps for buffer sharing - No more cookie/timestamp lookup all over kernel and userspace - A fully referenced count model Cons: - This requires a lot of design, my idea is full of wholes - Can it really be implemented in parallel ? - Might have the same gruyere effect on the buffer index in queue - The io ops need to be re-factored into something else see you tomorrow, Nicolas