Dear Bernard, Thank you a lot for your comments. Please, find my clarifications in place. I hope you find them helpful. -- Best regards, Alexey Filippov -----Original Message----- From: Bernard Aboba via Datatracker [mailto:noreply@xxxxxxxx] Sent: Friday, May 24, 2019 9:46 PM To: tsv-art@xxxxxxxx Cc: draft-ietf-netvc-requirements.all@xxxxxxxx; video-codec@xxxxxxxx; ietf@xxxxxxxx Subject: Tsvart last call review of draft-ietf-netvc-requirements-09 Reviewer: Bernard Aboba Review result: Not Ready This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@xxxxxxxx if you reply to or forward this review. Summary ---------- Overall, this document seems more focused on the requirements for development of codecs such as H.264 than on the requirements that would enable widescale adoption of a next generation codec. In practice, requirements reducing the fragmentation of implementations (such as a requirement that a compliant decoder be able to decode anything that an encoder can send) have proved critical to success, yet this document omits them. Also, the document appears focused on video technology as of 4-5 years ago, rather than the technology used in today's streaming and video conferencing services where support for scalable video coding (and advanced modes such as K-SVC) has become critically important. [AF] This document was written to be tool-agnostic and as less restrictive as possible but to cover the needs of a wide range of applications. The requirement of spatial and quality scalability were discussed during NETVC meeting and on the NETVC mail-list several times to work out an acceptable formulations. Issues ------ Section 2.1 Video material is encoded at different quality levels and different resolutions, which are then chosen by a client depending on its capabilities and current network bandwidth.... o Scalability or other forms of supporting multiple quality representations are beneficial if they do not incur significant bitrate overhead and if mandated in the first version. [BA] The words "are beneficial" suggests that support for scalability is optional. In practice, support for both temporal and spatial scalability has proved to be important since it has been widely adopted in dynamic streaming applications, in which the video material to be encoded once and played back at framerates, resolutions and quality levels dependent on network conditions and the characteristics of the endpoint devices. [AF] Of course, it is important to support resolution and quality scalability if it doesn't adversely affect compression performance. Always, it is a trade-off. We state requirements for a codec but do not describe a particular codec's architecture. Section 2.5 [BA] This section does not mention support for screen content coding tools. Given that these tools are so effective in reducing the bandwidth required for application sharing (compression of 75 percent is common), it is hard to imagine a next generation codec that would not support screen content coding. [AF] This section does not mention support for screen content coding tools since their absence will harm compression performance of a codec. So, if these tools are not used in a codec, it can't be competitive as compared with other codec. It will become apparent while testing it (by the way, the testing draft contains screen content materials). On the other hand, we shouldn't insist on support specific screen content tools that would restrict the freedom of codec developers. Section 2.6 Support for K-SVC modes has turned out to be important for game streaming, since these modes reduce delay spikes that would otherwise result from generation of a key frame. Since K-SVC modes have unusual characteristics (e.g. frames within a single temporal unit may not share the same temporal ID), they impose unique requirements on a video codec design. 3.2.3. Complexity: o Feasible real-time implementation of both an encoder and a decoder supporting a chosen subset of tools for hardware and software implementation on a wide range of state-of-the-art platforms. [BA] This sentence seems to imply that the tools supported in hardware and software might be different. In practice, this is problematic, particularly if support for some tools can be omitted at lower profile levels, because application developers then need to handle the disparities between tools support in different implementations. [AF] No, this sentence implies that a codec should be implementable in real time, at least, with a subset of its tools. Some non-normative (i.e., encoder-side tools such a 2-pass encoder) and normative tools can be skipped to enable real-time implementation on the majority of platforms. 3.2.4. Scalability: o Temporal (frame-rate) scalability should be supported. [BA] In practice, a next generation video codec also needs to support spatial scalability as well as temporal scalability. [AF] Again, a trade-off between single-layer and multi-layer codecs should be selected based on concrete RD-curves. It's well known that introducing temporal scalability doesn't harm compression performance. However, spatial scalability can do that. So, different decisions on presence of this scalability type are possible subject to architecture and compression performance of a codec. 3.2.5. Error resilience: o Error resilience tools that are complementary to the error protection mechanisms implemented on transport level should be supported. o The codec should support mechanisms that facilitate packetization of a bitstream for common network protocols. [BA] Both of these points require more elaboration. What error resilience tools as are being referred to, and what mechanisms are perceived to facilitate packetization? Is the latter referring to video codec syntax (e.g. NAL unit structure?). [AF] >What error resilience tools as are being referred to... Any error resilience that can provide additional benefits as compared to mechanism implemented on transport level. A set of error resilience tools can be different for different codecs (e.g., for wavelet-based codecs and H.26x). > what mechanisms are perceived to facilitate packetization? Is the latter referring to video codec syntax (e.g. NAL unit structure?) Yes, for example, NAL unit structure o The codec should support effective mechanisms for allowing decoding and reconstruction of significant parts of pictures in the event that parts of the picture data are lost in transmission. [BA] Not sure what this is referring to either. [AF] In this statement, we meant tools like entropy coding (e.g., CABAC). If CABAC is not reset for each picture / slice / tile, loss of a packet related to a given picture makes impossible to restore all next pictures. So, the frequency of CABAC resets should be chosen in order, on the one hand, to avoid damaging compression performance and, on the other hand, to allow "decoding and reconstruction of significant parts of pictures in the event that parts of the picture data are lost in transmission." 3.3.2. Scalability: o Resolution and quality (SNR) scalability that provide low compression efficiency penalty (up to 5% of BD-rate [12] increase per layer with reasonable increase of both computational and hardware complexity) can be supported in the main profile of the codec being developed by the NETVC WG. Otherwise, a separate profile is needed to support these types of scalability. [BA] Mixing support for scalability with profile negotiation leads to implementation balkanization that dramatically increases the complexity of application development. A better principle is that a compliant decoder should be able to decode any bitstream that an encoder can send. [AF] In the paragraph you cited, we meant a very simple thing: if the penalty in compression performance for resolution and quality (SNR) scalability is not that high, it makes sense to support them in the main profile. Otherwise, a separate profile is needed (as done in H.264/SVC or in H.265/SHVC). > A better principle is that a compliant decoder should be able to decode any bitstream that an encoder can send. It's very arguable what of the ways are better. It's absolutely mandatory to make a base layer decodable using any decoder (even such one that does not support a scalable profile if any). Not sure about enhancement layers. Moreover, some wavelet-based codecs (e.g., SPIHT or EZW) do not use the concept of layers at all. The paradigm of scalability is more natural there than in codecs that exploit the "hybrid video coding" paradigm (e.g., H.26x).