Re: Comments: [AVT] Last Call: RTP Payload for Comfort Noise to ProposedStandard

Leland_Thompson@3com.com · Tue, 30 Apr 2002 18:12:55 -0500

Steve,

Thanks for the clarification; however, your answer is exactly what I am
concerned about.

>  What the last sentence of the paragraph was trying to say is that if
>  there is a large change in the timestamp from one packet to the next,
>  but the sequence number only increments by one, then the receiver
>  knows that no packets were lost and that the gap in time was due to
>  intentional discontinuous transmission.
<snip>
>  Again, to be clear, the sequence number will increment by 1 for each
>  packet.  The timestamp will increment by the amount of time that
>  passes.

 My issue is not one of RTP Timestamps and Sequence Numbers; however, now
we can dig into the real concern at hand.

It seems that any protocol that actively communicates state transition
information to a system should theoretically, in general, notify the system
at the start of the event not at the end of the event.  If a state
transition has occurred, I may need to take some action or do something
differently.  With Silence Suppression, this is obviously the case.

For instance, of particular concern is not knowing when transitions
actually occur, but just as importantly, now having the possiblity that
significant time may elapse without knowing the actual state of the system.
This last issue can cause other issues.  For instance, delays in accurate
state information create additional problems if a system can end in a state
that is not known to all parties.  The possibility of not having a
transition to speech would cause the state information from the previous
transmission (the silence transition) to be lost, because your method relys
on receiving the next Voice packet, which never occurred.

What happens during the umpteen frame periods that we didn't correctly
identify the silence period?  How does it impact the speech signal/voice
quality?
How is this error reflected by the system in the form of statistics,
counters, etc?  Are the statistics accurate anymore?

If one allows the TimeStamp information along with the Sequence Number to
together tell an RTP Decoder (Receiver) when a Loss Event is really just a
Silence Period, one is presented with the following delemas.

1)
    -What is one to do during the first audio frame time when data is not
present?  In absence of a valid CN/SID frame, most (some) compliant
implementations will transition to a Loss State which will cause an
Interpolation of the Codec's decoder to occur.
    - There is no reason to believe, yet, that Comfort Noise Generation
should be activated.
    - Furthermore, if one where to activate CNG, what is to be generated?
You don't even have a minimal noise level to try and match just the back
ground noise level of the channel, let alone the spectral information that
might be typically present.

2)
What happens toward the end of a session where an RTP Encoder (Transmitter)
has transitioned to silence, however, the RTP Decoder (Reciever) thinks
this may be a loss event, and the call ends without the RTP Decoder ever
seeing another RTP Packet, which would have told him "BIG Change in
TimeStamp, Little change in Seq Num".  The state transition information is
lost, and now inaccurate statistics could be stored for this call because
of it.  Would this scenario have a potential impact to perceived Quality of
Service for this connection?  Absolutely it might!

Today there are real implementations of VOIP GWs that operate in real
Carrier Networks that monitor Quality of Service (QOS) in the form of
Excessive Packet Loss indicators for TRAPS and Alarms within a Network
Operations Center (NOC).  It is theoretically very important, therefore, to
actively and accurately monitor state transitions within the system that
would possibly cause a fault or alarm.  Silence Indication Descriptions in
the form of CN or SID frames are incredibly important in order to robustly
detect these state transitions at the point (time) of occurance.  I
strongly recommend we rethink my original statements about RTP Decoding and
how higher level protocol negotiations (i.e.  SIP - SDP, H.323/H.245, etc)
really may only make sense in establishing what an RTP Encoder (transmitter
- to packet network) does.

Therefore, if CN is not negotiated as supported, it should not be activated
or used.  VAD should only allowed when negotiated as supported and the
implementation of an IETF - CN (silence indication method) should comply to
a clearly identifiable transition of state as close to the actual state
transition as possible while communicating all the relavent information to
make Comfort Noise Generation (CNG) possible.

Lee