Re: Gen-art 2nd LC review of draft-ietf-payload-vp8-16

Harald Alvestrand <harald@xxxxxxxxxxxxx> · Thu, 30 Jul 2015 10:14:06 +0200

Unfortunately this is vacation time in Scandinavia - won't do anything
with this until mid-August, I'm afraid. (Henrik might.)

So it'll have to wait another 2 telechats or so.....

On 07/29/2015 11:58 PM, Ben Campbell wrote:
> Hi VP8 Authors:
>
> Has there been a response to Elwyn's Gen-ART review from the authors?
> If so, I haven't seen it. (I did see Colin's response).
>
> Based Elwyn's review, and on Harald's response to my earlier
> questions, I assume that the authors will wish to update the draft
> before it goes back to an IESG telechat. If that's not the case,
> please let me know. Please be advised that tomorrow (July 30) is the
> nominal deadline for getting things on the August 6 telechat.
>
> Thanks!
>
> Ben.
>
>
> On 13 Jul 2015, at 15:57, Ben Campbell wrote:
>
>> (+payload and Harald)
>>
>> Elwyn: Thanks for the detailed review!
>>
>> Authors: Please address Elwyn's comments at your earliest
>> convenience. There's quite a bit here, and I assume we will need a
>> new revision before taking this to the IESG telechat, but it would be
>> helpful to discuss any proposed changes via email before submitting.
>>
>> Thanks!
>>
>> Ben.
>>
>>
>> On 13 Jul 2015, at 6:58, Elwyn Davies wrote:
>>
>>> I am the assigned Gen-ART reviewer for this draft. For background on
>>> Gen-ART, please see the FAQ at
>>>
>>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>>>
>>> Please resolve these comments along with any other Last Call comments
>>> you may receive.
>>>
>>> Document: draft-ietf-payload-vp8-16.txt
>>> Reviewer: Elwyn Davies
>>> Review Date: 2015/07/10
>>> IETF LC End Date: 2015/07/13
>>> IESG Telechat date: (if known) -
>>>
>>> Summary: Gosh, it has been a long time since the last review!
>>>
>>> Major issues:
>>>
>>> Minor issues:
>>> s4.2: Various of the frame indices either SHOULD or, in the case of
>>> KEYIDX, MAY start at random values.  The rationale(s) for these
>>> requirements are not explained - nor is the reason why it it is only a
>>> MAY for KEYIDX whereas it is SHOULD for PictureID and what might happen
>>> were the MAY/SHOULD to be ignored.
>>>
>>> Is the rationale something that should be mentioned in the security
>>> considerations?
>>>
>>> s10.1:  The downref to RFC 6386 identified by idnits was duly noted
>>> in the last call
>>> announcement.  I don't have a problem with the downref.
>>>
>>> Nits/editorial comments:
>>>
>>> General: Consistent usage of "prediction and mode" vs
>>> "prediction/mode" would be desirable.
>>>
>>> s1, para 2: Needs to introduce the 'macroblock' jargon defined in
>>> RFC 6386. Suggest:
>>> OLD:
>>> VP8 is based on decomposition of frames into square sub-blocks of
>>> pixels, prediction of such sub-blocks using previously constructed
>>> blocks, and...
>>> NEW:
>>> VP8 is based on decomposition of frames into square sub-blocks of
>>> pixels known as "macroblocks" (see Section 2 of [RFC6386]).  Prediction
>>> of such sub-blocks using previously constructed blocks, and...
>>> END
>>>
>>> s2: There are a number of technical terms brought over from RFC 6386.
>>> These should be listed in s2.  At least the following would be in
>>> the list:
>>> key frame, interframe, golden frame, altref frame, macroblock.
>>> Also the terms picture selection index (RPSI) and slice loss
>>> indication (SLI) from RFC 4585.
>>>
>>> s3, para 2: Providing a reference to explain what temporal scalability
>>> and the hierarchy selection is all about would be useful.  One
>>> possibility
>>> would be:
>>>
>>> Lim, J. Yang, and B. Jeon,”Fast Coding Mode Decision for Scalable
>>> Video Coding”,
>>> 10th Int’l Conf. on Advanced Communication Technology, vol. 3, pp.
>>> 1897-1900, 2008.
>>>
>>> It would also help to add a pointer to the TL0PICIDX description in
>>> s4.2, thus:
>>> ADD:
>>> Support for temporal scalability is provided by the optional
>>> TL0PICIDX and TID/Y/KEYIDX
>>> fields described in Section 4.2.
>>> END
>>>
>>> ss3 and 4: The discussion of how the frame payload might or might
>>> not be distributed across
>>> multiple packets is not very clear.   The draft has in mind a
>>> 'preferred use case' (para 1
>>> of s4.4 says:
>>>
>>>> In the preferred use case, the S bit and PID fields
>>>> described in Section 4.2 should be used to indicate what the packet
>>>> contains.
>>>
>>> I think it would be helpful to be a bit more positive about this in
>>> s3, para 3:
>>> OLD:
>>> This memo
>>> allows the partitions to be sent separately or in the same RTP
>>> packet.  It may be beneficial for decoder error-concealment to send
>>> the partitions in different packets, even though it is not mandatory
>>> according to this specification.
>>> NEW:
>>> Accordingly, this document RECOMMENDS that the frame is packetized
>>> by the sender
>>> with each data partition in a separate packet or packets.  This may
>>> be beneficial
>>> for decoder error concealment and the payload format described in
>>> Section 4 provides
>>> fields that allow the partitions to be identified even if the first
>>> partition is
>>> not available.  The sender can, alternatively, aggregate the data
>>> partitions into
>>> a single data stream and, optionally, split it into several packets
>>> without
>>> consideration of the partition boundaries.  The receiver can use the
>>> length
>>> information in the first partition to identify the partitions during
>>> decoding.
>>> END
>>>
>>> s4/Figure 1: s/bytes/octets/ (2 places)
>>>
>>> Figure 1, caption: s/sequel/Sections 4.2 and 4.3/
>>>
>>> Figure 1: The format shown is correct only for the first packet in a
>>> frame.  Subsequent
>>> packets will not contain the payload header and will have later
>>> octets of the frame payload.
>>> This should be mentioned in drawing and/or in the caption.
>>>
>>> s4.1, last two paras:
>>>> Sequence number:  The sequence numbers are monotonically increasing
>>>> and set as packets are sent.
>>>>
>>>> The remaining RTP header fields are used as specified in
>>>> [RFC3550].
>>> Redefining the Sequence Number field seems gratuitous and
>>> potentially dangerous,
>>> given that it is *very carefully* defined in RFC 3550 and the
>>> overall intent does
>>> not seem any different from that in RFC 3550.  OTOH a note about the
>>> (non?-)use of the X bit
>>> and the Payload Type field (PT) would be useful.  I suggest
>>> replacing the paras above with:
>>> NEW:
>>> X bit: MUST be 0.  The VP8 RTP profile does not use the RTP Header
>>> Extension capability.
>>>
>>> PT (Payload Type):  In line with the policy in Section 3 of
>>> [RFC3551], applications
>>> using the VP8 RTP payload profile MUST assign a dynamic payload type
>>> number to
>>> be used in each RTP session and provide a mechanism to indicate the
>>> mapping.
>>> See Section 6.2 for the mechanism to be used with the Session
>>> Description Protocol (SDP)
>>> [RFC4566].
>>>
>>> The remaining RTP Fixed Header Fields (V, P, CC, sequence number,
>>> SSRC and CSRC
>>> identfiers) are used as specified in Section 5.1 of [RFC5330].
>>> END
>>> Note that this may be considered to make the reference to RFC 3551
>>> normative.
>>>
>>> s4.2, X bit:  There is potential for confusion between the X bit in
>>> the fixed header
>>> and the X bit in the VP8 Payload Descriptor.  Perhaps changing this
>>> to C for control bits
>>> would avoid the problem.
>>>
>>> s4.2, M bit: Similarly, it might be better to use B (for BIG) in
>>> place of M to avoid confusion.
>>>
>>> s4.2, para 7 after Fig 2: For consistency:
>>> OLD:
>>> When the X bit is set to 1 in the first octet, the extension bit
>>> field octet MUST be provided as the second octet.  If the X bit is 0,
>>> the extension bit field octet MUST NOT be present, and no extensions
>>> (I, L, T, or K) are permitted.
>>> NEW:
>>> When the X [or C] bit is set to 1 in the first octet, the Extended
>>> Control Bits
>>> field octet MUST be provided as the second octet.  If the X [or C]
>>> bit is 0,
>>> the extension bit field octet MUST NOT be present, and no extensions
>>> (I, L, T, or K) are permitted.
>>> END
>>>
>>> s4.2: The issue of using either 'wrap' or 'restart' but not both for
>>> restarting number sequences has been raised in various comments by
>>> IESG members.
>>>
>>> s4.2, TL0PICIDX: I think, given the statements in s3 about not
>>> defining the hierarchy, that more explanation is needed of what the
>>> 'lowest or base layer' actually means.
>>>
>>> s4.2, TL0PICIDX:
>>>> If TID is larger than 0, TL0PICIDX
>>>>   indicates which base layer frame the current image depends on.
>>>>   TL0PICIDX MUST be incremented when TID is 0.
>>> What happens if L=1 but both T=0 and K=0 so that there is no TID
>>> value present? Or indeed if T=0 but K=1 so that the TID field is
>>> there but 'MUST be ignored by the receiver'  (definition of TID field)?
>>>
>>> s4.3: It would be useful to include the section number (9.1) where
>>> the uncompressed data chunk specification is found.  I was somewhat
>>> surprised that the ordering is reversed in the protocol spec.
>>>
>>> s4.3, VER: The receiver should validate this field - currently only
>>> values 0-3 are valid.
>>>
>>> s4.3, SizeN:  Make it clear these are integer fields: s/in Size0,/in
>>> integer fields Size0,/
>>>
>>> s4.3 and s4.4:  It would be helpful to implementers to explain what
>>> the offsets in the  first partition payload are for the additional
>>> partition count and additional partition sizes.  Having rummaged
>>> around in RFC  6386, I have to say that I am unclear whether the
>>> values are placed after the full chunk of data indicated by
>>> 1stPartitionSize or are part of this data. And if the latter is the
>>> case, how to work out where the partition sizes are.  I guess that
>>> they ought to be at offset (1stPartitionSize+10 - key frame - or
>>> 1stPartitionSize+3 - interframe) in the VP8 Payload field as it is
>>> difficult to work out where they are without decoding the partition
>>> otherwise.  Also exactly how many partition sizes to expect - I
>>> think I read in RFC 6386 that the last partition size is implicit.
>>> Help!
>>>
>>> In the course of clearing this up, it might be appropriate to move
>>> the last sentence of the first para of s4.4 and the second para of
>>> s4.4 into s4.3 as part of the explanation.  This is the relevant text:
>>>> Note that the length of the first partition can
>>>> always be obtained from the first partition size parameter in the VP8
>>>> payload header.
>>>>
>>>> The VP8 bitstream format [RFC6386] specifies that if multiple DCT/WHT
>>>> partitions are produced, the location of each partition start is
>>>> found at the end of the first (prediction/mode) partition.  In this
>>>> RTP payload specification, the location offsets are considered to be
>>>> part of the first partition.
>>>
>>>
>>> s4.4:  I found this section very difficult to parse.  It is a bit
>>> 'stream of consciousness' and would benefit from a reordering and
>>> rewrite.  Suggestion (assuming the second para gets moved to s4.3):
>>> NEW SECTION 4.4:
>>> An encoded VP8 frame can be divided into two or more partitions, as
>>> described in Section 1.  It is OPTIONAL for a packetizer implementing
>>> this RTP specification
>>> to pay attention to the partition boundaries within an encoded frame.
>>> If packetization of a frame is done without considering the partition
>>> boundaries, the PID field MAY be set to zero for all packets, and the
>>> S bit MUST NOT be set to one for any other packet than the first.
>>>
>>> If the preferred usage suggested in Section 3 is followed, with each
>>> packet
>>> carrying data from exactly one partition, the S bit and PID fields
>>> described in Section 4.2 SHOULD be used to indicate what the packet
>>> contains.  The PID field should indicate which partition the first
>>> octet of the payload belongs to, and the S bit indicates that the
>>> packet starts on a new partition.
>>>
>>> If the packetizer does not pay attention to the partition
>>> boundaries, one
>>> packet can contain a fragment of a  partition, a complete partition,
>>> or an
>>> aggregate of fragments and partitions.  There is no explicit
>>> signaling of
>>> partition boundaries in the payload and the partition lengths at the
>>> end of
>>> the first partition have to be used to identify the boundaries.
>>> Partitions
>>> MUST be aggregated in decoding order.  Two fragments from different
>>> partitions MAY be aggregated into the same packet along with one or
>>> more
>>> complete partitions.
>>>
>>> In all cases, the payload of a packet MUST contain data from only
>>> one video
>>> frame.  Consequently the set of packets carrying the data from a
>>> particular
>>> frame will contain exactly one VP8 Payload Header (see Section 4.3)
>>> carried
>>> in the first packet of the frame.  The last, or only, packet
>>> carrying data for the
>>> frame MUST have the M bit set in the RTP header.
>>>
>>> s4.5.1:  Th algorithm only applies for the RECOMMENDED use case with
>>> partitions in separate packets.
>>> This should be noted.
>>>
>>> s5.2, last para: s/only parts of the frame is corrupted./only part
>>> of the frame is corrupted./
>>>
>>> s6: s/This payload format has two required parameters./This payload
>>> format has two optional parameters./
>>> [I think this is probably a hangover from the previous version.]
>>>
>>> s7: s/Cryptographic system may also allow/The cryptographic system
>>> may also allow/
>>>
>>> s7:
>>>> the usage of SRTP [RFC3711] is recommended.
>>> RECOMMENDED?
>>>
>>> s10.2: I suspect that RFC 3551 is normative.

-- 
Surveillance is pervasive. Go Dark.