I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
Please resolve these comments along with any other Last Call comments
you may receive.
Document: draft-ietf-payload-vp8-16.txt
Reviewer: Elwyn Davies
Review Date: 2015/07/10
IETF LC End Date: 2015/07/13
IESG Telechat date: (if known) -
Summary: Gosh, it has been a long time since the last review!
Major issues:
Minor issues:
s4.2: Various of the frame indices either SHOULD or, in the case of
KEYIDX, MAY start at random values. The rationale(s) for these
requirements are not explained - nor is the reason why it it is only
a
MAY for KEYIDX whereas it is SHOULD for PictureID and what might
happen
were the MAY/SHOULD to be ignored.
Is the rationale something that should be mentioned in the security
considerations?
s10.1: The downref to RFC 6386 identified by idnits was duly noted
in the last call
announcement. I don't have a problem with the downref.
Nits/editorial comments:
General: Consistent usage of "prediction and mode" vs
"prediction/mode" would be desirable.
s1, para 2: Needs to introduce the 'macroblock' jargon defined in RFC
6386. Suggest:
OLD:
VP8 is based on decomposition of frames into square sub-blocks of
pixels, prediction of such sub-blocks using previously constructed
blocks, and...
NEW:
VP8 is based on decomposition of frames into square sub-blocks of
pixels known as "macroblocks" (see Section 2 of [RFC6386]).
Prediction
of such sub-blocks using previously constructed blocks, and...
END
s2: There are a number of technical terms brought over from RFC 6386.
These should be listed in s2. At least the following would be in the
list:
key frame, interframe, golden frame, altref frame, macroblock.
Also the terms picture selection index (RPSI) and slice loss
indication (SLI) from RFC 4585.
s3, para 2: Providing a reference to explain what temporal
scalability
and the hierarchy selection is all about would be useful. One
possibility
would be:
Lim, J. Yang, and B. Jeon,”Fast Coding Mode Decision for Scalable
Video Coding”,
10th Int’l Conf. on Advanced Communication Technology, vol. 3, pp.
1897-1900, 2008.
It would also help to add a pointer to the TL0PICIDX description in
s4.2, thus:
ADD:
Support for temporal scalability is provided by the optional
TL0PICIDX and TID/Y/KEYIDX
fields described in Section 4.2.
END
ss3 and 4: The discussion of how the frame payload might or might not
be distributed across
multiple packets is not very clear. The draft has in mind a
'preferred use case' (para 1
of s4.4 says:
In the preferred use case, the S bit and PID fields
described in Section 4.2 should be used to indicate what the packet
contains.
I think it would be helpful to be a bit more positive about this in
s3, para 3:
OLD:
This memo
allows the partitions to be sent separately or in the same RTP
packet. It may be beneficial for decoder error-concealment to send
the partitions in different packets, even though it is not mandatory
according to this specification.
NEW:
Accordingly, this document RECOMMENDS that the frame is packetized by
the sender
with each data partition in a separate packet or packets. This may
be beneficial
for decoder error concealment and the payload format described in
Section 4 provides
fields that allow the partitions to be identified even if the first
partition is
not available. The sender can, alternatively, aggregate the data
partitions into
a single data stream and, optionally, split it into several packets
without
consideration of the partition boundaries. The receiver can use the
length
information in the first partition to identify the partitions during
decoding.
END
s4/Figure 1: s/bytes/octets/ (2 places)
Figure 1, caption: s/sequel/Sections 4.2 and 4.3/
Figure 1: The format shown is correct only for the first packet in a
frame. Subsequent
packets will not contain the payload header and will have later
octets of the frame payload.
This should be mentioned in drawing and/or in the caption.
s4.1, last two paras:
Sequence number: The sequence numbers are monotonically increasing
and set as packets are sent.
The remaining RTP header fields are used as specified in
[RFC3550].
Redefining the Sequence Number field seems gratuitous and potentially
dangerous,
given that it is *very carefully* defined in RFC 3550 and the overall
intent does
not seem any different from that in RFC 3550. OTOH a note about the
(non?-)use of the X bit
and the Payload Type field (PT) would be useful. I suggest replacing
the paras above with:
NEW:
X bit: MUST be 0. The VP8 RTP profile does not use the RTP Header
Extension capability.
PT (Payload Type): In line with the policy in Section 3 of
[RFC3551], applications
using the VP8 RTP payload profile MUST assign a dynamic payload type
number to
be used in each RTP session and provide a mechanism to indicate the
mapping.
See Section 6.2 for the mechanism to be used with the Session
Description Protocol (SDP)
[RFC4566].
The remaining RTP Fixed Header Fields (V, P, CC, sequence number,
SSRC and CSRC
identfiers) are used as specified in Section 5.1 of [RFC5330].
END
Note that this may be considered to make the reference to RFC 3551
normative.
s4.2, X bit: There is potential for confusion between the X bit in
the fixed header
and the X bit in the VP8 Payload Descriptor. Perhaps changing this
to C for control bits
would avoid the problem.
s4.2, M bit: Similarly, it might be better to use B (for BIG) in
place of M to avoid confusion.
s4.2, para 7 after Fig 2: For consistency:
OLD:
When the X bit is set to 1 in the first octet, the extension bit
field octet MUST be provided as the second octet. If the X bit is 0,
the extension bit field octet MUST NOT be present, and no extensions
(I, L, T, or K) are permitted.
NEW:
When the X [or C] bit is set to 1 in the first octet, the Extended
Control Bits
field octet MUST be provided as the second octet. If the X [or C]
bit is 0,
the extension bit field octet MUST NOT be present, and no extensions
(I, L, T, or K) are permitted.
END
s4.2: The issue of using either 'wrap' or 'restart' but not both for
restarting number sequences has been raised in various comments by
IESG members.
s4.2, TL0PICIDX: I think, given the statements in s3 about not
defining the hierarchy, that more explanation is needed of what the
'lowest or base layer' actually means.
s4.2, TL0PICIDX:
If TID is larger than 0, TL0PICIDX
indicates which base layer frame the current image depends on.
TL0PICIDX MUST be incremented when TID is 0.
What happens if L=1 but both T=0 and K=0 so that there is no TID
value present? Or indeed if T=0 but K=1 so that the TID field is
there but 'MUST be ignored by the receiver' (definition of TID
field)?
s4.3: It would be useful to include the section number (9.1) where
the uncompressed data chunk specification is found. I was somewhat
surprised that the ordering is reversed in the protocol spec.
s4.3, VER: The receiver should validate this field - currently only
values 0-3 are valid.
s4.3, SizeN: Make it clear these are integer fields: s/in Size0,/in
integer fields Size0,/
s4.3 and s4.4: It would be helpful to implementers to explain what
the offsets in the first partition payload are for the additional
partition count and additional partition sizes. Having rummaged
around in RFC 6386, I have to say that I am unclear whether the
values are placed after the full chunk of data indicated by
1stPartitionSize or are part of this data. And if the latter is the
case, how to work out where the partition sizes are. I guess that
they ought to be at offset (1stPartitionSize+10 - key frame - or
1stPartitionSize+3 - interframe) in the VP8 Payload field as it is
difficult to work out where they are without decoding the partition
otherwise. Also exactly how many partition sizes to expect - I think
I read in RFC 6386 that the last partition size is implicit. Help!
In the course of clearing this up, it might be appropriate to move
the last sentence of the first para of s4.4 and the second para of
s4.4 into s4.3 as part of the explanation. This is the relevant
text:
Note that the length of the first partition can
always be obtained from the first partition size parameter in the
VP8
payload header.
The VP8 bitstream format [RFC6386] specifies that if multiple
DCT/WHT
partitions are produced, the location of each partition start is
found at the end of the first (prediction/mode) partition. In this
RTP payload specification, the location offsets are considered to be
part of the first partition.
s4.4: I found this section very difficult to parse. It is a bit
'stream of consciousness' and would benefit from a reordering and
rewrite. Suggestion (assuming the second para gets moved to s4.3):
NEW SECTION 4.4:
An encoded VP8 frame can be divided into two or more partitions, as
described in Section 1. It is OPTIONAL for a packetizer implementing
this RTP specification
to pay attention to the partition boundaries within an encoded frame.
If packetization of a frame is done without considering the partition
boundaries, the PID field MAY be set to zero for all packets, and the
S bit MUST NOT be set to one for any other packet than the first.
If the preferred usage suggested in Section 3 is followed, with each
packet
carrying data from exactly one partition, the S bit and PID fields
described in Section 4.2 SHOULD be used to indicate what the packet
contains. The PID field should indicate which partition the first
octet of the payload belongs to, and the S bit indicates that the
packet starts on a new partition.
If the packetizer does not pay attention to the partition boundaries,
one
packet can contain a fragment of a partition, a complete partition,
or an
aggregate of fragments and partitions. There is no explicit
signaling of
partition boundaries in the payload and the partition lengths at the
end of
the first partition have to be used to identify the boundaries.
Partitions
MUST be aggregated in decoding order. Two fragments from different
partitions MAY be aggregated into the same packet along with one or
more
complete partitions.
In all cases, the payload of a packet MUST contain data from only one
video
frame. Consequently the set of packets carrying the data from a
particular
frame will contain exactly one VP8 Payload Header (see Section 4.3)
carried
in the first packet of the frame. The last, or only, packet carrying
data for the
frame MUST have the M bit set in the RTP header.
s4.5.1: Th algorithm only applies for the RECOMMENDED use case with
partitions in separate packets.
This should be noted.
s5.2, last para: s/only parts of the frame is corrupted./only part of
the frame is corrupted./
s6: s/This payload format has two required parameters./This payload
format has two optional parameters./
[I think this is probably a hangover from the previous version.]
s7: s/Cryptographic system may also allow/The cryptographic system
may also allow/
s7:
the usage of SRTP [RFC3711] is recommended.
RECOMMENDED?
s10.2: I suspect that RFC 3551 is normative.