Re: [PATCH v4 5/6] media: Add controls for JPEG quantization tables

Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx> · Mon, 03 Sep 2018 14:15:20 -0300

On Mon, 2018-09-03 at 16:51 +0100, Ian Arkver wrote:
> Hi Hans, Ezequiel,
> 
> On 03/09/2018 16:33, Hans Verkuil wrote:
> > On 09/03/2018 05:27 PM, Ezequiel Garcia wrote:
> > > Hi Ian, Hans:
> > > 
> > > On Mon, 2018-09-03 at 14:29 +0100, Ian Arkver wrote:
> > > > Hi,
> > > > 
> > > > On 03/09/2018 10:50, Hans Verkuil wrote:
> > > > > On 08/31/2018 05:52 PM, Ezequiel Garcia wrote:
> > > > > > From: Shunqian Zheng <zhengsq@xxxxxxxxxxxxxx>
> > > > > > 
> > > > > > Add V4L2_CID_JPEG_QUANTIZATION compound control to allow userspace
> > > > > > configure the JPEG quantization tables.
> > > > > > 
> > > > > > Signed-off-by: Shunqian Zheng <zhengsq@xxxxxxxxxxxxxx>
> > > > > > Signed-off-by: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx>
> > > > > > ---
> > > > > >    .../media/uapi/v4l/extended-controls.rst      | 23 +++++++++++++++++++
> > > > > >    .../media/videodev2.h.rst.exceptions          |  1 +
> > > > > >    drivers/media/v4l2-core/v4l2-ctrls.c          | 10 ++++++++
> > > > > >    include/uapi/linux/v4l2-controls.h            |  5 ++++
> > > > > >    include/uapi/linux/videodev2.h                |  1 +
> > > > > >    5 files changed, 40 insertions(+)
> > > > > > 
> > > > > > diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> > > > > > index 9f7312bf3365..e0dd03e452de 100644
> > > > > > --- a/Documentation/media/uapi/v4l/extended-controls.rst
> > > > > > +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> > > > > > @@ -3354,7 +3354,30 @@ JPEG Control IDs
> > > > > >        Specify which JPEG markers are included in compressed stream. This
> > > > > >        control is valid only for encoders.
> > > > > >    
> > > > > > +.. _jpeg-quant-tables-control:
> > > > > >    
> > > > > > +``V4L2_CID_JPEG_QUANTIZATION (struct)``
> > > > > > +    Specifies the luma and chroma quantization matrices for encoding
> > > > > > +    or decoding a V4L2_PIX_FMT_JPEG_RAW format buffer. The two matrices
> > > > > > +    must be set in JPEG zigzag order, as per the JPEG specification.
> > > > > 
> > > > > Can you change "JPEG specification" to a reference to the JPEG spec entry
> > > > > in bibio.rst?
> > > > > 
> > > > > > +
> > > > > > +
> > > > > > +.. c:type:: struct v4l2_ctrl_jpeg_quantization
> > > > > > +
> > > > > > +.. cssclass:: longtable
> > > > > > +
> > > > > > +.. flat-table:: struct v4l2_ctrl_jpeg_quantization
> > > > > > +    :header-rows:  0
> > > > > > +    :stub-columns: 0
> > > > > > +    :widths:       1 1 2
> > > > > > +
> > > > > > +    * - __u8
> > > > > > +      - ``luma_quantization_matrix[64]``
> > > > > > +      - Sets the luma quantization table.
> > > > > > +
> > > > > > +    * - __u8
> > > > > > +      - ``chroma_quantization_matrix[64]``
> > > > > > +      - Sets the chroma quantization table.
> > > > > 
> > > > > Just checking: the JPEG standard specifies this as unsigned 8-bit values as well?
> > > 
> > > I thought this was already discussed, but I think the only thing I've added
> > > is this comment in one of the driver's headers:
> > > 
> > >   JPEG encoder
> > >   ------------
> > >   The VPU JPEG encoder produces JPEG baseline sequential format.
> > >   The quantization coefficients are 8-bit values, complying with
> > >   the baseline specification. Therefore, it requires application-defined
> > >   luma and chroma quantization tables. The hardware does entrophy
> > >   encoding using internal Huffman tables, as specified in the JPEG
> > >   specification.
> > > 
> > > Certainly controls should be specified better.
> > > 
> > > > As far as I can see ISO/IEC 10918-1 does not specify the precision or
> > > > signedness of the quantisation value Qvu. The default tables for 8-bit
> > > > baseline JPEG all fit into __u8 though.
> > > > 
> > > 
> > > Paragraph 4.7 of that spec, indicates the "sample" precision:
> > > 8-bit for baseline; 8-bit or 12-bit for extended.
> > > 
> > > For the quantization coefficients, the DQT segment contains a bit
> > > that indicates if the quantization coefficients are 8-bit or 16-bit.
> > > See B.2.4.1 for details.
> 
> See below (and Tq which follows the Pq field)
> 
> > > 
> > > > However there can be four sets of tables in non-baseline JPEG and it's
> > > 
> > > You lost me here, which four sets of tables are you refering to?
> > > 
> > > > not clear (to me) whether 12-bit JPEG would need more precision (I'd
> > > > guess it would).
> > > 
> > > It seems it would. From B.2.4.1:
> > > 
> > > "An 8-bit DCT-based process shall not use a 16-bit precision quantization table."
> > > 
> > > > Since this patch is defining UAPI I think it might be
> > > > good to build in some additional information, eg. number of tables,
> > > > element size. Maybe this can all be inferred from the selected pixel
> > > > format? If so then it would need documented that the above structure
> > > > only applies to baseline.
> > > > 
> > > 
> > > For quantization coefficients, I can only see two tables: one for luma
> > > one for chroma. Huffman coefficients are a different story and we are
> > > not really adding them here.
> 
> I was looking at the definition of Tqi in the frame header in B.2.2 
> which seems to allow up to four (sets of?) quantization tables. 
> Rereading it, it seems these are per component. Table B.2 implies that 
> this applies to Baseline Sequential too. In the DQT marker description 
> there's a Tq field to specify the destination for the new table. I think 
> this means that an encoder can use up to four (sets of) tables and a 
> decoder should be able to store four from the stream.
> 
> This may well not be relevant to the encoder under discussion, but if 
> it's not allowed for in UAPI it's almost a given that it'll need to be 
> added later.
> 

Hm, I think it's not really relevant for us, either on the encoding
or decoding side. Let me explain how I read the spec.

First of all, keep in mind it seems to be written with streams
in mind, which explains different start-of-image and start-of-frames
segments (SOI and SOF).

The way I read the four tables thing, the decoder expects to be first
transmitted a set of quantization tables, via DQT segments. Each DQT
segment contains a 4-bit index (0-3), allowing up to four quantization
tables to be defined.

Then, each frame is transmitted in a SOF segment. The SOF header
contains an 8-bit field (Tq_i) indicating which quantization table has
to be used for this frame.

In Video4Linux, we are frame-based, and the JPEG segments have no meaning,
because these controls are meant to be used with the JPEG RAW format,
as specified.

For the decoder side, the application is expected to parse all the segments,
and set quantization tables and compressed bitstream, via legacy or
request APIs (probably the latter).

Same goes for encoding, the user will set a quantization table for the
encoding process and then take care of the JPEG segments creation.

> BTW, my copy of the spec is dated 1993(E). Maybe it's out of date?
> 

My copy is also 1992, so even more dated :-)

> > 
> > Since (if I understand this correctly) we would need u16 for extended precision
> > JPEG, shouldn't we use u16 instead of u8? That makes the control more generic.
> 
> This seems like a safer option to me.
> 

Yes, I agree too.

> 
> > 
> > BTW, are the coefficients always unsigned? I think so, but I never read the
> > JPEG spec.
> > 

I don't think signed quantization coefficients make sense, so perhaps
this is not explicitly specified?

Regards,
Eze