KNK SS7-27 - first experiences - part 1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!
  So, I'm replying to my own original post, to keep the question and a
possible answer together without any excessive or unrelated information.
  I hope I've found the cause of the problem and I hope I solved it. A
modified libss7 is now online and I'm waiting for busy hours to see, whether
it will help.
  The problem is, that in the isup_rel() function, all the important
got_sent_msg flags are cleared, so the stack "forgets" a preceding call
state:
... isup_rel():
                c->got_sent_msg |= ISUP_SENT_REL;
                c->got_sent_msg &= ~(ISUP_SENT_IAM | ISUP_PENDING_IAM | ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
...
  So, an incoming MSU, which was perfectly legitimate before sending REL,
is now handled as unexpected.
  My solution adds the following code to the isup_receive() function for
every message, which can confuse the stack by the discovered cause
(an example for ACM message):
                case ISUP_ACM:
+                       if (c->got_sent_msg & ISUP_SENT_REL) {
+                               ss7_message(ss7, "Got unexpected ACM after sending REL on CIC %d PC %d, ignoring ", c->cic, opc);
+                               return 0;
+                       }

                        if (!(c->got_sent_msg & ISUP_SENT_IAM)) {
                                ss7_message(ss7, "Got ACM but we didn't send IAM on CIC %d PC %d ", c->cic, opc);
                                return isup_handle_unexpected(ss7, c, opc);
                        }

If my change will prove good, I'm planning to remove the ss7_message() to
limit the stack verbosity, as these situations are relatively frequent under
heavy load and I think they are moreless logical and normal.

  I would be glad for some words from the KNK branch maintainer(s), whether to
create a JIRA issue and put my patch there or how to proceed now in general.

With regards,
   Pavel



> Hi!
>   I would like to share my expiernce with deployment of this experimental SS7
> branch.
>   The first impressions are good, especially the timers seem to work well,
> saving many calls from being frozen.
>   However, there are still some strange things, which I would like to discuss
> here, one by one.
>   The first one is, that the channel sometimes doesn't recognize a message
> (mostly RLC), even it comes from an action initiated by the channel itself.
> Typically, the following is appearing often:
> 
> [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097
> [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic
> 
>   As I understand, there were some timeouts and now the channel tries to
> recover by sending RSC and firing T17. However, it seems that it immediately
> rejects RLC, which comes back as a response to the RSC which was just sent
> upon expiry of T17. And this appears again and again in the rhythm of T17,
> and the channel is not operational.
> ss7 show calls shows the following line for the misbehaving CIC:
>    27  4097  11  IAM                       IAM
>  
>   Or, a very similar situation:
> [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
> [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC
> 
>   The first question is, why there was no call while SUS was received. My
> idea is, that both the parties hung up their phones in the same time and
> that the call was undergoing destruction on Asterisk side (REL just sent
> or something like this), while SUS arrived. Maybe the call was marked as
> cleared even before RLC came back ? OK, I can understand this. But
> if the CIC was reset as the first message says (i.e. RSC was sent), why the
> RLC going back is not recognized then ?
> 
> Or, just now the following appeared:
> 
> [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic
> [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic
> 
> Again, it's questionable, why this happened, but the second line seems
> to indicate some brokeness again.
> 
> To explain: The channel is operating on a gateway equipped with 16 E1s
> and current traffic is about 10 CAPS, there are two linksets to two
> cooperating exchanges. They are EWSDs, which have very mature and stable
> SS7, so I'm almost sure that they are not making signalling errors.
> 
> With regards,
>   Pavel
> 
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> 
> asterisk-ss7 mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-ss7



[Index of Archives]     [Asterisk App Development]     [PJ SIP]     [Gnu Gatekeeper]     [IETF Sipping]     [Info Cyrus]     [ALSA User]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Backpacking]     [Deep Creek Hot Springs]     [Yosemite Campsites]     [ISDN Cause Codes]     [Asterisk Books]

  Powered by Linux