Hi Paul, On Tue, Apr 16, 2019 at 4:55 PM Paul Kocialkowski <paul.kocialkowski@xxxxxxxxxxx> wrote: > > Hi, > > Le mardi 16 avril 2019 à 16:22 +0900, Alexandre Courbot a écrit : > > [...] > > > Thanks for this great discussion. Let me try to summarize the status > > of this thread + the IRC discussion and add my own thoughts: > > > > Proper support for multiple decoding units (e.g. H.264 slices) per > > frame should not be an afterthought ; compliance to encoded formats > > depend on it, and the benefit of lower latency is a significant > > consideration for vendors. > > > > m2m, which we use for all stateless codecs, has a strong assumption > > that one OUTPUT buffer consumed results in one CAPTURE buffer being > > produced. This assumption can however be overruled: at least the venus > > driver does it to implement the stateful specification. > > > > So we need a way to specify frame boundaries when submitting encoded > > content to the driver. One request should contain a single OUTPUT > > buffer, containing a single decoding unit, but we need a way to > > specify whether the driver should directly produce a CAPTURE buffer > > from this request, or keep using the same CAPTURE buffer with > > subsequent requests. > > > > I can think of 2 ways this can be expressed: > > 1) We keep the current m2m behavior as the default (a CAPTURE buffer > > is produced), and add a flag to ask the driver to change that behavior > > and hold on the CAPTURE buffer and reuse it with the next request(s) ; > > That would kind of break the stateless idea. I think we need requests > to be fully independent of eachother and have some entity that > coordinates requests for this kind of things. Side note: the idea that stateless decoders are entirely stateless is not completely accurate anyway. When we specify a resolution on the OUTPUT queue, we already store some state. What matters IIUC is that the *hardware* behaves in a stateless manner. I don't think we should refrain from storing some internal driver state if it makes sense. Back to the topic: the effect of this flag would just be that the first buffer is the CAPTURE queue is not removed, i.e. the next request will work on the same buffer. It doesn't really preserve any state - if the next request is the beginning of a different frame, then the previous work will be discarded and the driver will behave as it should, not considering any previous state. > > > 2) We specify that no CAPTURE buffer is produced by default, unless a > > flag asking so is specified. > > > > The flag could be specified in one of two ways: > > a) As a new v4l2_buffer.flag for the OUTPUT buffer ; > > b) As a dedicated control, either format-specific or more common to all codecs. > > I think we must aim for a generic solution that would be at least > common to all codecs, and if possible common to requests regardless of > whether they concern video decoding or not. > > I really like the idea of introducing a requests batch/group/queue, > which groups requests together and allows marking them done when the > whole group is done being decoded. For that, we explicitly mark one of > the requests as the final one, so that we can continue adding requests > to the batch even when it's already being processed. When all the > requests are done being decoded, we can mark them done. I'd need to see this idea more developed (with maybe an example of the sequence of IOCTLs) to form an opinion about it. Also would need to be given a few examples of where this could be used outside of stateless codecs. Then we will have to address what this means for requests: your argument against using a "release CAPTURE buffer" flag was that requests won't be fully independent from each other anymore, but I don't see that situation changing with batches. Then, does the end of a batch only means that a CAPTURE buffer should be released, or are other actions required for non-codec use-cases? There are lots and lots of questions like this one lurking. > > With that, we also need some tweaking in the core to look for an > available capture buffer that matches the output buffer's timestamp > before trying to dequeue the next available capture buffer I don't think that would be strictly necessary, unless we want to be able to decode slices from different frames before the first one is completed? > This way, > the first request of the batch will get any queued capture buffer, but > subsequent requests will find the matchung capture buffer by timestamp. > > I think that's basically all we need to handle that and the two aspects > (picking by timestamp and requests groups) are rather independent and > the latter could probably be used in other situations than video > decoding. > > What do you think? At the current point I'd like to avoid over-engineering things. Introducing a request batch mechanism would mean more months spent before we can set the stateless codec API in stone, and at some point we need to settle and release something that people can use. We don't even have clear idea of what batches would look like and in which cases they would be used. The idea of an extra flag is simple and AFAICT would do the job nicely, so why not proceed with this for the time being? Cheers, Alex.