thanks for the comments, Matt. Responses below: Matt Mathis wrote: > I reviewed draft-ietf-sipping-overload-reqs-02 at the request of the transport > area directors. Note that my area of expertise is TCP, congestion control and > bulk data transport. I am not a SIP expert, and have not been following the > SIP documents. > > I have serious concerns about this document because it explicitly excludes the > only approach for coping with overload that is guaranteed to be robust under > all conditions. Although I know it is considered bad form to describe > solutions while debating requirements, I think a sketch of a solution will > greatly clarify the discussion of the requirements. > > The only robust overload signal is the natural implicit signal - silently > discarding excess requests. Explicit overload messages (code 503) should be > optional, and must have an explicit rate limit. Agree. Our intention for the solution was exactly that; we have an explicit feedback mechanism (like ECN provides) that can be used, in addition to treating lack of any signal as a sign of congestion as well. > Sending additional messages to explicitly indicate overload is intrinsically > fragile. Agree too. SIP requests normally generate responses, and so the plan is to have a response code which can be used to clearly say "I'm overloaded". This is not an additional message - its the normal SIP message that is sent - but with clear meaning. And of course, lack of any response at all needs to be treated as a sign of congestion too. > My specific objections to the document are as follows: Requirement 6 calls for > explicit overload messages and forbids silently discarding requests, since > they are not unambiguous in their meaning. That was not the intent of the requirement. The requirement is meant to say that, any explicit message used to signal overload must be used solely for that purpose, and not to signal other, non-overload related events. I've reworded to say: <t hangText="REQ 6:">When overload is signaled by means of a specific message, the message must clearly indicate that it is being sent because of overload, as opposed to other, non-overload based failure conditions. This requirement is meant to avoid some of the problems that have arisen from the reuse of the 503 response code for multiple purposes. Of course, overload is also signaled by lack of response to requests. This requirement applies only to explicit overload signals. </t> > Requirement 15 seems to provide a > loophole (allowing complete failures) but seems to forbid using it as the > preferred mechanism. Per above, the intention all along was to treat lack of a response as an indication of congestion. The requirement most certainly does not limit itself to complete failures; it calls out overload as the first cause of this problem. Neither does the requirement forbid lack of a response from being the preferred mechanism. The requirement reads: <t hangText="REQ 15:"> In cases where a network element fails, is so overloaded that it cannot process messages, or cannot communicate due to a network failure or network partition, it will not be able to provide explicit indications of its levels of congestion. The mechanism should properly function in these cases. </t> I think this is pretty clear and it directly addresses your concern - the solution has to work in cases where there is no response whatsoever. Can you suggest alternate text that would improve here? > Requirement 8 does not make sense without explicit > notification. Reworded to: <t hangText="REQ 8:"> The mechanism shall ensure that, when a request was not processed successfully due to overload (or failure) of a downstream element, the request will not be retried on another element which is also overloaded or whose status is unknown. This requirement derives from REQ 1. </t> which handles both explicit and implicit overload signals. > Requirements 7, 8 and 9 should note that they can be (are > already?) equivalently satisfied by properly structured exponential > retransmission backoff timers in SIP itself. Requirements 8 and 9 deal with sending requests to other elements, besides the one which was overloaded. That case is not handled by the structured exponential backoff timers in SIP, which handle retransmissions of a request within a single transaction to a single server. These requirements are dealing with behavior across different servers and different transactions. Requirement 7 is partly addressed by SIPs retransmit behavior. However, those timers apply independently to each transaction, and in cases of a large number of transactions between a pair of servers, is not sufficient to prevent overload. This requirement is meant to improve on this situation. > > I would like to point out that TCP, IP and several other transport protocols > have evolved in the same direction as I am advocating for SIP: the only robust > indication that an error has occurred is connection failure. True, and we absolutely need to utilize that. However, I do not believe this eliminates the utility of explicit congestion indicators, as ECN provides (for example), as a way to further improve performance. > Error messages > are cached and sometimes accelerate timers (e.g. retransmit now, or go to the > next IP address now), but do not change basic protocol behavior. Error > messages are most often rate limited at the sender and the saved error codes > are used to provide a clue why something failed, but the fact that it failed > most likely comes from a timer, not the message itself. The number of error > massages that are required for correct operation is declining (note that 4821 > makes ICMP can't fragment optional), and may be zero. > > Rate limiting all errors messages and treating them as advisory improves > robustness in several ways: fraudulent messages have less impact, error > messages can not be used an DDOS attack magnifiers, and overload is addressed > implicitly by silently discarding requests. > > Note that the normal, non-crisis, behavior has not changed significantly: > error message are sent, cached and reported to the application. However, in a > crisis, the error reporting degrades gracefully, while the throughput goes > flat, without any negative slope. This is where SIP (and all other protocols) > should strive to be. Right - and the purpose of the explicit signals are these periods of overload but not periods of crisis. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. 499 Thornall St. Cisco Fellow Edison, NJ 08837 Cisco, Voice Technology Group jdrosen@xxxxxxxxx http://www.jdrosen.net PHONE: (408) 902-3084 http://www.cisco.com _______________________________________________ IETF mailing list IETF@xxxxxxxx https://www.ietf.org/mailman/listinfo/ietf