I reviewed draft-ietf-sipping-overload-reqs-02 at the request of the transport area directors. Note that my area of expertise is TCP, congestion control and bulk data transport. I am not a SIP expert, and have not been following the SIP documents. I have serious concerns about this document because it explicitly excludes the only approach for coping with overload that is guaranteed to be robust under all conditions. Although I know it is considered bad form to describe solutions while debating requirements, I think a sketch of a solution will greatly clarify the discussion of the requirements. The only robust overload signal is the natural implicit signal - silently discarding excess requests. Explicit overload messages (code 503) should be optional, and must have an explicit rate limit. The error message may be cached (e.g. in proxies, etc) but must not be required to be cached. All retransmissions in all parts of the protocol must back off exponentially (which I am told is already true for SIP). Sending additional messages to explicitly indicate overload is intrinsically fragile. If the overload management mechanism consumes any shared resource that might be needed to complete other calls, then there exists some operating point where any additional requests will cause a decline in the number of successfully completed calls. This is likely to be regenerative, with each successive error using more resources and preventing more calls, until the throughput crashes to zero. This phenomena was readily apparent in all of the plots shown in the tsvwg meeting at IETF 71. Note that if the explicit overload management mechanism is very complicated, the situation that triggers this failure might also be very complicated. Asserting that this hazard does not exist is probably equivalent to proving that explicit overload notifications never cause additional calls to fail, for all combinations of implementations under all operating conditions. It would not be an easy task to prove that the standards are sufficient to guarantee this for all possible implementations. My specific objections to the document are as follows: Requirement 6 calls for explicit overload messages and forbids silently discarding requests, since they are not unambiguous in their meaning. Requirement 15 seems to provide a loophole (allowing complete failures) but seems to forbid using it as the preferred mechanism. Requirement 8 does not make sense without explicit notification. Requirements 7, 8 and 9 should note that they can be (are already?) equivalently satisfied by properly structured exponential retransmission backoff timers in SIP itself. I would like to point out that TCP, IP and several other transport protocols have evolved in the same direction as I am advocating for SIP: the only robust indication that an error has occurred is connection failure. Error messages are cached and sometimes accelerate timers (e.g. retransmit now, or go to the next IP address now), but do not change basic protocol behavior. Error messages are most often rate limited at the sender and the saved error codes are used to provide a clue why something failed, but the fact that it failed most likely comes from a timer, not the message itself. The number of error massages that are required for correct operation is declining (note that 4821 makes ICMP can't fragment optional), and may be zero. Rate limiting all errors messages and treating them as advisory improves robustness in several ways: fraudulent messages have less impact, error messages can not be used an DDOS attack magnifiers, and overload is addressed implicitly by silently discarding requests. Note that the normal, non-crisis, behavior has not changed significantly: error message are sent, cached and reported to the application. However, in a crisis, the error reporting degrades gracefully, while the throughput goes flat, without any negative slope. This is where SIP (and all other protocols) should strive to be. Treating all errors as soft should have been an Internet Architectural Principle. Thanks, --MM-- ------------------------------------------- Matt Mathis http://staff.psc.edu/mathis Work:412.268.3319 Home/Cell:412.654.7529 ------------------------------------------- _______________________________________________ IETF mailing list IETF@xxxxxxxx https://www.ietf.org/mailman/listinfo/ietf