Re: [PATCH v2 2/3] [media] allegro: add Allegro DVT video IP core driver

Michael Tretter <m.tretter@xxxxxxxxxxxxxx> · Wed, 30 Jan 2019 17:14:27 +0100

On Wed, 30 Jan 2019 10:19:53 -0500, Nicolas Dufresne wrote:
> Le mercredi 30 janvier 2019 à 08:47 +0100, Hans Verkuil a écrit :
> > On 1/30/19 4:41 AM, Nicolas Dufresne wrote:  
> > > Hi Hans,
> > > 
> > > Le mercredi 23 janvier 2019 à 11:44 +0100, Hans Verkuil a écrit :  
> > > > > +     if (*nplanes != 0) {
> > > > > +             if (vq->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {
> > > > > +                     if (*nplanes != 1 ||
> > > > > +                         sizes[0] < channel->sizeimage_encoded)
> > > > > +                             return -EINVAL;  
> > > > 
> > > > Question relating to calculating sizeimage_encoded: is that guaranteed to be
> > > > the largest buffer size that is needed to compress a frame? What if it is
> > > > not large enough after all? Does the encoder protect against that?
> > > > 
> > > > I have a patch pending that allows an encoder to spread the compressed
> > > > output over multiple buffers:
> > > > 
> > > > https://patchwork.linuxtv.org/patch/53536/
> > > > 
> > > > I wonder if this encoder would be able to use it.  
> > > 
> > > Userspace around most existing codecs expect well framed capture buffer
> > > from the encoder. Spreading out the buffer will just break this
> > > expectation.
> > > 
> > > This is specially needed for VP8/VP9 as these format are not meant to
> > > be streamed that way.  
> > 
> > Good to know, thank you.
> >   
> > > I believe a proper solution to that would be to hang the decoding
> > > process and send an event (similar to resolution changes) to tell user
> > > space that capture buffers need to be re-allocated.  
> > 
> > That's indeed an alternative. I wait for further feedback from Tomasz
> > on this.
> > 
> > I do want to add that allowing it to be spread over multiple buffers
> > also means more optimal use of memory. I.e. the buffers for the compressed
> > data no longer need to be sized for the worst-case size.  
> 
> My main concern is that it's no longer optimal for transcoding cases.
> To illustrate, an H264 decoders still have the restriction that they
> need compleat NALs for each memory pointer (if not an complete AU). The
> reason is that writing a parser that can handle a bitstream across two
> unaligned (in CPU term and in NAL term) is difficult and inefficient.
> So most decoder would need to duplicate the allocation, in order to
> copy these input buffer to properly sized buffer. Note that for
> hardware like CODA, I believe this copy is always there, since the
> hardware uses a ring buffer. With high bitrate stream, the overhead is
> important. It also breaks the usage of hardware synchronization IP,
> which is a key feature on the ZynqMP.

I am a little bit confused about your use case. In transcoding cases
there is decoder -> encoder, i.e., the decoder comes first. You
describe the case where we have encoder -> decoder, for which I cannot
image a use case that is actually performance critical.

I am not sure how the hardware synchronization IP plays into this, but
maybe that is, because I don't really understand your use case.

Michael

> 
> As Micheal said, the vendor driver here predict the allocation size
> base on width/height/profile/level and chroma being used (that's
> encoded in the pixel format). The chroma was added later for the case
> we have a level that supports both 8 and 10bits, which when used in
> 8bits mode would lead to over-allocation of memory and VCU resources.
> But the vendor kernel goes a little beyond the spec by introducing more
> named profiles then define in the spec, so that they can further
> control the allocation (specially the VCU core allocation, otherwise
> you don't get to run as many instances in parallel).
> 
> > 
> > Regards,
> > 
> > 	Hans  
> 
>