KNK SS7-27 - first experiences - part 1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kaloyan,

> Hi all,
> sorry for joining so late, but i am on holidays (by the end of the week) 
> and rarely checking my mailbox. Thanks to bad weather i did that today :)

Never mind, I'm happy you're here!

>
> To the OP:
> while reading the first posts i thought it is an old problem with REL/RSC 
> loop (persistent on start with ANSI signaling) which was fixed in libss7 
> instead of sig_ss7, but not sure if it is a similar yet different one or it 
> is the same issue. It really is a (remaining) problem if we receive RLC on 
> previous REL, but after we have sent RSC. I was thinking to clear the old 
> status bits after we receive RLC, but this will not fix the double RLC 
> received problem and we can't ignore the first one (or just clear the 
> SENT_REL flag), because we may never get a second one, so it should 
> probably be better to ignore sending second RSC inside 
> isup_handle_unexpected() if the previous one was sent T17 (timer seconds) 
> ago. Because the timer is stopped on RLC it should be another timer or some 
> flag to ignore it's expiration and not reset again ... will work on this 
> next week when i am back.

I think it's another problem. Sometimes I have also this kind of loop, lasting
for hours, until it somewhat settles itself. But the error I've reported here
is, that we clear the old status flags immediately after sending our REL and
if an MSU is already coming back (it may be any common MSU like ACM, CPG, ANM,
SUS, RES, REL..., at least I've encountered all these), we don't expect it,
we call isup_handle_unexpected() and we send RSC, which is absolutely surplus,
because there is nothing wrong with the call state, we just have to ignore
this (and possibly any other) MSUs, until we get RLC acknowledging our REL.
  My patch does it by checking ISUP_SENT_REL, however, it might be better to
postpone clearing the got_sent_msg flags from isup_rel() to the ISUP_RLC case
in isup_receive(). However, I didn't know, whether leaving these flags set after
sending REL wouldn't make harm somewhere, so I did it as written, and about 300
thousands of calls during yesterday didn't discover any problem with the patch.
So, today I removed the ss7_message() calls from my patch and since then,
Asterisk is very quiet and seems very happy, and cooperating EWSDs as well :-).

With regards,
  Pavel

>
> The code in my branch is actually Domjan Attila's version (the patches 
> attached to the SS7-27 issue) ported to later Asterisk versions with very 
> few additions/modifications, so the muffins are for him, while the bugs are 
> from me :)
>
> P.S.
> apologies for top posting - the connection is unstable and i had to write 
> the post offline and just copy/paste it
>
> On 2013-06-26 06:42, Pavel Troller wrote:
>> Hi!
>> So, I'm replying to my own original post, to keep the question and a
>> possible answer together without any excessive or unrelated information.
>> I hope I've found the cause of the problem and I hope I solved it. A
>> modified libss7 is now online and I'm waiting for busy hours to see, 
>> whether
>> it will help.
>> The problem is, that in the isup_rel() function, all the important
>> got_sent_msg flags are cleared, so the stack "forgets" a preceding call
>> state:
>> ... isup_rel():
>> c->got_sent_msg |= ISUP_SENT_REL;
>> c->got_sent_msg &= ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
>> ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
>> ...
>> So, an incoming MSU, which was perfectly legitimate before sending REL,
>> is now handled as unexpected.
>> My solution adds the following code to the isup_receive() function for
>> every message, which can confuse the stack by the discovered cause
>> (an example for ACM message):
>> case ISUP_ACM:
>> +                       if (c->got_sent_msg & ISUP_SENT_REL) {
>> +                               ss7_message(ss7, "Got unexpected ACM
>> after sending REL on CIC %d PC %d, ignoring ", c->cic, opc);
>> +                               return 0;
>> +                       }
>>
>> if (!(c->got_sent_msg & ISUP_SENT_IAM)) {
>> ss7_message(ss7, "Got ACM but we didn't send IAM on CIC %d PC %d ",
>> c->cic, opc);
>> return isup_handle_unexpected(ss7, c, opc);
>> }
>>
>> If my change will prove good, I'm planning to remove the ss7_message() to
>> limit the stack verbosity, as these situations are relatively frequent 
>> under
>> heavy load and I think they are moreless logical and normal.
>>
>> I would be glad for some words from the KNK branch maintainer(s), whether 
>> to
>> create a JIRA issue and put my patch there or how to proceed now in 
>> general.
>>
>> With regards,
>> Pavel
>>
>>
>>
>> Hi!
>> I would like to share my expiernce with deployment of this experimental 
>> SS7
>> branch.
>> The first impressions are good, especially the timers seem to work well,
>> saving many calls from being frozen.
>> However, there are still some strange things, which I would like to 
>> discuss
>> here, one by one.
>> The first one is, that the channel sometimes doesn't recognize a message
>> (mostly RLC), even it comes from an action initiated by the channel 
>> itself.
>> Typically, the following is appearing often:
>>
>> [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] 
>> ISUP timer t17 expired on CIC 27 DPC 4097
>> [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic
>>
>> As I understand, there were some timeouts and now the channel tries to
>> recover by sending RSC and firing T17. However, it seems that it 
>> immediately
>> rejects RLC, which comes back as a response to the RSC which was just sent
>> upon expiry of T17. And this appears again and again in the rhythm of T17,
>> and the channel is not operational.
>> ss7 show calls shows the following line for the misbehaving CIC:
>> 27  4097  11  IAM                       IAM
>>
>> Or, a very similar situation:
>> [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
>> [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC
>>
>> The first question is, why there was no call while SUS was received. My
>> idea is, that both the parties hung up their phones in the same time and
>> that the call was undergoing destruction on Asterisk side (REL just sent
>> or something like this), while SUS arrived. Maybe the call was marked as
>> cleared even before RLC came back ? OK, I can understand this. But
>> if the CIC was reset as the first message says (i.e. RSC was sent), why 
>> the
>> RLC going back is not recognized then ?
>>
>> Or, just now the following appeared:
>>
>> [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic
>> [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic
>>
>> Again, it's questionable, why this happened, but the second line seems
>> to indicate some brokeness again.
>>
>> To explain: The channel is operating on a gateway equipped with 16 E1s
>> and current traffic is about 10 CAPS, there are two linksets to two
>> cooperating exchanges. They are EWSDs, which have very mature and stable
>> SS7, so I'm almost sure that they are not making signalling errors.
>>
>> With regards,
>> Pavel
>>
>> --
>> _____________________________________________________________________
>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>>
>> asterisk-ss7 mailing list
>> To UNSUBSCRIBE or update options visit:
>> http://lists.digium.com/mailman/listinfo/asterisk-ss7
>>
>> --
>> _____________________________________________________________________
>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>>
>> asterisk-ss7 mailing list
>> To UNSUBSCRIBE or update options visit:
>> http://lists.digium.com/mailman/listinfo/asterisk-ss7
>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-ss7 mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-ss7



[Index of Archives]     [Asterisk App Development]     [PJ SIP]     [Gnu Gatekeeper]     [IETF Sipping]     [Info Cyrus]     [ALSA User]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Backpacking]     [Deep Creek Hot Springs]     [Yosemite Campsites]     [ISDN Cause Codes]     [Asterisk Books]

  Powered by Linux