Tsvart last call review of draft-ietf-netvc-requirements-09

Bernard Aboba via Datatracker <noreply@xxxxxxxx> · Fri, 24 May 2019 11:46:00 -0700

Reviewer: Bernard Aboba
Review result: Not Ready

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@xxxxxxxx if you reply to or forward this review.

Summary
----------
Overall, this document seems more focused on the requirements for development
of codecs such as H.264 than on the requirements that would enable widescale
adoption of a next generation codec.  In practice, requirements reducing the
fragmentation of implementations (such as a requirement that a compliant
decoder be able to decode anything that an encoder can send) have proved
critical to success, yet this document omits them.  Also, the document appears
focused on video technology as of 4-5 years ago, rather than the technology
used in today's streaming and video conferencing services where support for
scalable video coding (and advanced modes such as K-SVC) has become critically
important.

Issues
------

Section 2.1

   Video material is encoded at different quality levels and different
   resolutions, which are then chosen by a client depending on its
   capabilities and current network bandwidth....

   o  Scalability or other forms of supporting multiple quality
      representations are beneficial if they do not incur significant
      bitrate overhead and if mandated in the first version.

[BA] The words "are beneficial" suggests that support for scalability is
optional.  In practice, support for both temporal and spatial scalability has
proved to be important since it has been widely adopted in dynamic streaming
applications, in which the video material to be encoded once and played back at
 framerates, resolutions and quality levels dependent on network conditions and
the characteristics of the endpoint devices.

Section 2.5

[BA] This section does not mention support for screen content coding tools. 
Given that these tools are so effective in reducing the bandwidth required for
application sharing (compression of 75 percent is common), it is hard to
imagine a next generation codec that would not support screen content coding.

Section 2.6

Support for K-SVC modes has turned out to be important for game streaming,
since these modes reduce delay spikes that would otherwise result from
generation of a key frame.  Since K-SVC modes have unusual characteristics
(e.g. frames within a single temporal unit may not share the same temporal ID),
they impose unique requirements on a video codec design.

   3.2.3. Complexity:

   o  Feasible real-time implementation of both an encoder and a
      decoder supporting a chosen subset of tools for hardware and
      software implementation on a wide range of state-of-the-art
      platforms.

[BA] This sentence seems to imply that the tools supported in hardware and
software might be different.  In practice, this is problematic, particularly if
support for some tools can be omitted at lower profile levels, because
application developers then need to handle the disparities between tools
support in different implementations.

   3.2.4. Scalability:

   o  Temporal (frame-rate) scalability should be supported.

[BA] In practice, a next generation video codec also needs to support spatial
scalability as well as temporal scalability.

   3.2.5. Error resilience:

   o  Error resilience tools that are complementary to the error
      protection mechanisms implemented on transport level should be
      supported.

   o  The codec should support mechanisms that facilitate packetization
      of a bitstream for common network protocols.

[BA] Both of these points require more elaboration.  What error resilience
tools as are being referred to, and what mechanisms are perceived to facilitate
packetization?  Is the latter referring to video codec syntax (e.g. NAL unit
structure?).

   o  The codec should support effective mechanisms for allowing
      decoding and reconstruction of significant parts of pictures in
      the event that parts of the picture data are lost in
      transmission.

[BA] Not sure what this is referring to either.

   3.3.2. Scalability:

   o  Resolution and quality (SNR) scalability that provide low
      compression efficiency penalty (up to 5% of BD-rate [12] increase
      per layer with reasonable increase of both computational and
      hardware complexity) can be supported in the main profile of the
      codec being developed by the NETVC WG. Otherwise, a separate
      profile is needed to support these types of scalability.

[BA] Mixing support for scalability with profile negotiation leads to
implementation balkanization that dramatically increases the complexity of
application development.  A better principle is that a compliant decoder should
be able to decode any bitstream that an encoder can send.