Steve, Thanks for the clarification; however, your answer is exactly what I am concerned about. > What the last sentence of the paragraph was trying to say is that if > there is a large change in the timestamp from one packet to the next, > but the sequence number only increments by one, then the receiver > knows that no packets were lost and that the gap in time was due to > intentional discontinuous transmission. <snip> > Again, to be clear, the sequence number will increment by 1 for each > packet. The timestamp will increment by the amount of time that > passes. My issue is not one of RTP Timestamps and Sequence Numbers; however, now we can dig into the real concern at hand. It seems that any protocol that actively communicates state transition information to a system should theoretically, in general, notify the system at the start of the event not at the end of the event. If a state transition has occurred, I may need to take some action or do something differently. With Silence Suppression, this is obviously the case. For instance, of particular concern is not knowing when transitions actually occur, but just as importantly, now having the possiblity that significant time may elapse without knowing the actual state of the system. This last issue can cause other issues. For instance, delays in accurate state information create additional problems if a system can end in a state that is not known to all parties. The possibility of not having a transition to speech would cause the state information from the previous transmission (the silence transition) to be lost, because your method relys on receiving the next Voice packet, which never occurred. What happens during the umpteen frame periods that we didn't correctly identify the silence period? How does it impact the speech signal/voice quality? How is this error reflected by the system in the form of statistics, counters, etc? Are the statistics accurate anymore? If one allows the TimeStamp information along with the Sequence Number to together tell an RTP Decoder (Receiver) when a Loss Event is really just a Silence Period, one is presented with the following delemas. 1) -What is one to do during the first audio frame time when data is not present? In absence of a valid CN/SID frame, most (some) compliant implementations will transition to a Loss State which will cause an Interpolation of the Codec's decoder to occur. - There is no reason to believe, yet, that Comfort Noise Generation should be activated. - Furthermore, if one where to activate CNG, what is to be generated? You don't even have a minimal noise level to try and match just the back ground noise level of the channel, let alone the spectral information that might be typically present. 2) What happens toward the end of a session where an RTP Encoder (Transmitter) has transitioned to silence, however, the RTP Decoder (Reciever) thinks this may be a loss event, and the call ends without the RTP Decoder ever seeing another RTP Packet, which would have told him "BIG Change in TimeStamp, Little change in Seq Num". The state transition information is lost, and now inaccurate statistics could be stored for this call because of it. Would this scenario have a potential impact to perceived Quality of Service for this connection? Absolutely it might! Today there are real implementations of VOIP GWs that operate in real Carrier Networks that monitor Quality of Service (QOS) in the form of Excessive Packet Loss indicators for TRAPS and Alarms within a Network Operations Center (NOC). It is theoretically very important, therefore, to actively and accurately monitor state transitions within the system that would possibly cause a fault or alarm. Silence Indication Descriptions in the form of CN or SID frames are incredibly important in order to robustly detect these state transitions at the point (time) of occurance. I strongly recommend we rethink my original statements about RTP Decoding and how higher level protocol negotiations (i.e. SIP - SDP, H.323/H.245, etc) really may only make sense in establishing what an RTP Encoder (transmitter - to packet network) does. Therefore, if CN is not negotiated as supported, it should not be activated or used. VAD should only allowed when negotiated as supported and the implementation of an IETF - CN (silence indication method) should comply to a clearly identifiable transition of state as close to the actual state transition as possible while communicating all the relavent information to make Comfort Noise Generation (CNG) possible. Lee