On 12.03.2018 11:29, Tomasz Figa wrote: > On Mon, Mar 12, 2018 at 5:25 PM, Paul Kocialkowski > <paul.kocialkowski@xxxxxxxxxxx> wrote: >> Hi, >> >> On Mon, 2018-03-12 at 17:15 +0900, Tomasz Figa wrote: >>> Hi Paul, Dmitry, >>> >>> On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowski >>> <paul.kocialkowski@xxxxxxxxxxx> wrote: >>>> Hi, >>>> >>>> On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: >>>>> Hello, >>>>> >>>>> On 07.03.2018 19:37, Paul Kocialkowski wrote: >>>>>> Hi, >>>>>> >>>>>> First off, I'd like to take the occasion to say thank-you for >>>>>> your >>>>>> work. >>>>>> This is a major piece of plumbing that is required for me to add >>>>>> support >>>>>> for the Allwinner CedarX VPU hardware in upstream Linux. Other >>>>>> drivers, >>>>>> such as tegra-vde (that was recently merged in staging) are also >>>>>> badly >>>>>> in need of this API. >>>>> >>>>> Certainly it would be good to have a common UAPI. Yet I haven't >>>>> got my >>>>> hands on >>>>> trying to implement the V4L interface for the tegra-vde driver, >>>>> but >>>>> I've taken a >>>>> look at Cedrus driver and for now I've one question: >>>>> >>>>> Would it be possible (or maybe already is) to have a single IOCTL >>>>> that >>>>> takes input/output buffers with codec parameters, processes the >>>>> request(s) and returns to userspace when everything is done? >>>>> Having 5 >>>>> context switches for a single frame decode (like Cedrus VAAPI >>>>> driver >>>>> does) looks like a bit of overhead. >>>> >>>> The V4L2 interface exposes ioctls for differents actions and I don't >>>> think there's a combined ioctl for this. The request API was >>>> introduced >>>> precisely because we need to have consistency between the various >>>> ioctls >>>> needed for each frame. Maybe one single (atomic) ioctl would have >>>> worked >>>> too, but that's apparently not how the V4L2 API was designed. >>>> >>>> I don't think there is any particular overhead caused by having n >>>> ioctls >>>> instead of a single one. At least that would be very surprising >>>> IMHO. >>> >>> Well, there is small syscall overhead, which normally shouldn't be >>> very painful, although with all the speculative execution hardening, >>> can't be sure of anything anymore. :) >> >> Oh, my mistake then, I had it in mind that it is not really something >> noticeable. Hopefully, it won't be a limiting factor in our cases. > > With typical frame rates achievable by hardware codecs, I doubt that > it would be a limiting factor. We're using a similar API (a WiP > version of pre-Request API prototype from long ago) in Chrome OS > already without any performance issues. Thank you very much for the answers! The syscalls overhead is miserable in comparison to the rest of decoding, though I wanted to clarify whether there is a way to avoid it. Atomic API sounds like something that would suit well for that.