Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Ayaka <ayaka@xxxxxxxxxxx> · Wed, 30 Jan 2019 15:03:51 +0800



Sent from my iPad

> On Jan 30, 2019, at 5:41 AM, Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> 
>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>> <paul.kocialkowski@xxxxxxxxxxx> wrote:
>>> Hi,
>>> 
>>>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>>>> Sent from my iPad
>>>> 
>>>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <paul.kocialkowski@xxxxxxxxxxx> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>>>>>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>>>>>> requests cabac table, scaling list, picture parameter set and reference
>>>>>> picture storing in one or various of DMA buffers. I am not talking about
>>>>>> the data been parsed, the decoder would requests a raw data.
>>>>>> 
>>>>>> For the pps and rps, it is possible to reuse the slice header, just let
>>>>>> the decoder know the offset from the bitstream bufer, I would suggest to
>>>>>> add three properties(with sps) for them. But I think we need a method to
>>>>>> mark a OUTPUT side buffer for those aux data.
>>>>> 
>>>>> I'm quite confused about the hardware implementation then. From what
>>>>> you're saying, it seems that it takes the raw bitstream elements rather
>>>>> than parsed elements. Is it really a stateless implementation?
>>>>> 
>>>>> The stateless implementation was designed with the idea that only the
>>>>> raw slice data should be passed in bitstream form to the decoder. For
>>>>> H.264, it seems that some decoders also need the slice header in raw
>>>>> bitstream form (because they take the full slice NAL unit), see the
>>>>> discussions in this thread:
>>>>> media: docs-rst: Document m2m stateless video decoder interface
>>>> 
>>>> Stateless just mean it won’t track the previous result, but I don’t
>>>> think you can define what a date the hardware would need. Even you
>>>> just build a dpb for the decoder, it is still stateless, but parsing
>>>> less or more data from the bitstream doesn’t stop a decoder become a
>>>> stateless decoder.
>>> 
>>> Yes fair enough, the format in which the hardware decoder takes the
>>> bitstream parameters does not make it stateless or stateful per-se.
>>> It's just that stateless decoders should have no particular reason for
>>> parsing the bitstream on their own since the hardware can be designed
>>> with registers for each relevant bitstream element to configure the
>>> decoding pipeline. That's how GPU-based decoder implementations are
>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>> 
>>> So the format we have agreed on so far for the stateless interface is
>>> to pass parsed elements via v4l2 control structures.
>>> 
>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>> sure what the best solution would be. Reconstructing the bitstream in
>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>> having the data both in parsed and raw forms. Do you see another
>>> possibility?
>> 
>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>> generic interface to an encoded format which the driver needs to
>> convert into a sequence that the hardware can understand. Typically
>> this is done by populating hardware-specific structures. Can't we
>> consider that in this specific instance, the hardware-specific
>> structure just happens to be identical to the original bitstream
>> format?
> 
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
Lucky, most of hardware won’t be able to processing such a big buffer.
General speaking, the register is 24bits for stream length in bytes.
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).
> 
>> 
>> I agree that this is not strictly optimal for that particular
>> hardware, but such is the cost of abstractions, and in this specific
>> case I don't believe the cost would be particularly high?