possible bug: REGISTER and DNS SRV/DNS A, 407 proxy unauthorized

pierre-luc.bacon@xxxxxxxxxxxxxxxxxxxx (Pierre-Luc Bacon) · Wed, 22 Jul 2009 11:57:10 -0400

Here is a follow up after discussing the issue with the support staff at
callcentric.com. Their answer:

        "Hello, 
        To be honest with you, what you're describing we'd consider a
        bug in whatever SIP stack is being used here. The behavior
        you're describing is very abnormal and not really the right way
        to implement this. 

        When a registration happens, there's no reason for the UA to
        start a register "dialog" with 1 IP, and then continue it with
        another IP. To date I don't remember seeing another UA behave
        this way. This is especially true since when the initial
        REGISTER message is challenged, and the 407 received, and the
        REGISTER with authentication is sent, it is all part of the same
        dialog and includes the same Call-ID and sequential CSeq's. 

        If you capture any common UA (hardware or software) such as
        one's from Counterpath, SJLabs, Linksys, Granstream, Snom, etc
        you will not see the behavior you are describing. 

        What should happen is that the UA requests a DNS record for the
        registrar/proxy - first for DNS SRV, then falling back to A
        records, and cache's the IP address of the server(s) until the
        TTL expires. When the TTL expires it should again do a DNS
        query. The only reason for a UA to CHANGE the IP it is sending
        the requests to is: 
        A. Timeout with no response from that IP; in which case it
        should pull (or actually already have cached) the next record
        (either the next for DNS SRV based on priority/weight, or the
        next in the list from A records - since every DNS server will
        return a randomly sorted list of A records); and send the
        request to the next IP. 
        B. The TTL expires for the DNS records. 

        Unless either A or B happens, there's no reason for a UA to
        switch to sending its requests to another IP. 

        BTW, you mentioned also that: 
        "Asterisk seems to implement the "right" 
        behaviour: 
        https://issues.asterisk.org/bug_view_advanced_page.php?bug_id... 
        This is somewhat true... Asterisk does seem to be getting better
        at dealing with DNS SRV and A records. Previously Asterisk just
        completely ignored weight/priority in DNS SRV and just took the
        1 record returned; it also ignored sending to the same IP and
        would re-register every time to a new IP, as well as
        "forgetting" which IP it had registered to previously which
        causes/caused problems with inbound calls. I wouldn't call
        Asterisk implementation ideal, but it does seem to be getting
        better; this is another reason we don't use Asterisk in-house. 

        I haven't re-read the RFC to try find a specific
        sentence/section that contradicts your statement; but RFC's are
        unfortunately never as strict as industry practice; and to date
        we've never seen a device that acts in the way you are
        suggesting. I don't beleive it is written (and I don't remember
        it being) that each REGISTER request should perform a new DNS
        lookup, and this also doesn't make a lot of sense in general.
        Even if it is explicitly written this way, I don't think the
        industry has interpreted it this way when implementing any of
        the popular SIP stacks that are used by OEM's; so I don't think
        the onus is on us in this case to address this abnormal
        behavior. 

        Thank you."

As a temporary fix in our application (sflphone), I implemented a
workaround where the user can choose to perform a DNS lookup before
registering and then use that same IP address for the whole time the
software is opened. Not pretty, but at least registration is no longer a
probabilistic operation.

8 replies after I opened this thread, we haven't heard yet about what
Benny Prijono thinks of this issue. I would have much interest in
knowing it. 

I think that implementing an optional "non-RFC" mode (automatically or
manually triggered by the user) would make PJSIP way more robust with a
wide range of SIP proxies/registrars.

On Mon, 2009-07-20 at 09:23 +0200, Klaus Darilion wrote:
> 
> Pierre-Luc Bacon schrieb:
> > I think that we are facing a dilemma here.
> > 
> > Digressing the RFC would make PJSIP to work properly in this case (which
> > might appear with a lot more VOIP providers doing load balancing).
> > 
> > However, leaving it untouched makes the matter way more complex to deal
> > with in the application. A possible workaround and maybe the only one I
> > see, would be to resolve the host name initially, and use that IP from
> > the moment the user launched the application to the end. Not so
> > pretty ... 
> > 
> > I'm not familiar with SIP load balancing but should a "good" load
> > balancer infrastructure be able to forward 407 challenges among
> > themselves ? That way, a server which didn't send that challenge
> > initially can answer back properly to that new REGISTER. 
> 
> I think it really depends on the used software and configuration. E.g. 
> if you use openser, you can configure it to allow nonce_reuse. Then, the 
> nonce is calculated stateless in all openser instances identical and any 
> proxy acecpts a nonce which was generated by another proxy.
> 
> Probably if a SIP proxy calculates the nonce stateful and does not share 
> the nonce between the various proxies, it does not work.
> 
> regards
> klaus