KNK SS7-27 - first experiences - part 1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2013-06-27 07:48, Pavel Troller wrote:
> Hi Kaloyan,
> 
> Hi all,
> sorry for joining so late, but i am on holidays (by the end of the 
> week)
> and rarely checking my mailbox. Thanks to bad weather i did that today 
> :)
> 
> Never mind, I'm happy you're here!
> 
> 
> To the OP:
> while reading the first posts i thought it is an old problem with 
> REL/RSC
> loop (persistent on start with ANSI signaling) which was fixed in 
> libss7
> instead of sig_ss7, but not sure if it is a similar yet different one 
> or it
> is the same issue. It really is a (remaining) problem if we receive RLC 
> on
> previous REL, but after we have sent RSC. I was thinking to clear the 
> old
> status bits after we receive RLC, but this will not fix the double RLC
> received problem and we can't ignore the first one (or just clear the
> SENT_REL flag), because we may never get a second one, so it should
> probably be better to ignore sending second RSC inside
> isup_handle_unexpected() if the previous one was sent T17 (timer 
> seconds)
> ago. Because the timer is stopped on RLC it should be another timer or 
> some
> flag to ignore it's expiration and not reset again ... will work on 
> this
> next week when i am back.
> 
> I think it's another problem. Sometimes I have also this kind of loop, 
> lasting
> for hours, until it somewhat settles itself. But the error I've 
> reported here
> is, that we clear the old status flags immediately after sending our 
> REL and
> if an MSU is already coming back (it may be any common MSU like ACM, 
> CPG, ANM,
> SUS, RES, REL..., at least I've encountered all these), we don't expect 
> it,
> we call isup_handle_unexpected() and we send RSC, which is absolutely 
> surplus,
> because there is nothing wrong with the call state, we just have to 
> ignore
> this (and possibly any other) MSUs, until we get RLC acknowledging our 
> REL.
> My patch does it by checking ISUP_SENT_REL, however, it might be better 
> to
> postpone clearing the got_sent_msg flags from isup_rel() to the 
> ISUP_RLC case
> in isup_receive(). However, I didn't know, whether leaving these flags 
> set after
> sending REL wouldn't make harm somewhere, so I did it as written, and 
> about 300
> thousands of calls during yesterday didn't discover any problem with 
> the patch.
> So, today I removed the ss7_message() calls from my patch and since 
> then,
> Asterisk is very quiet and seems very happy, and cooperating EWSDs as 
> well :-).
> 

I have just uploaded a new version to review 2150, which actually 
ignores unexpected messages when we are waiting for RLC and have the 
relevant timers (ISUP_T1, T5, T16 and T17) as it is a bit risky to 
ignore them otherwise - we may never get RLC on our REL, while timers 
will guarantee that we will resend it or send RSC in this case.


> With regards,
> Pavel
> 
> 
> The code in my branch is actually Domjan Attila's version (the patches
> attached to the SS7-27 issue) ported to later Asterisk versions with 
> very
> few additions/modifications, so the muffins are for him, while the bugs 
> are
> from me :)
> 
> P.S.
> apologies for top posting - the connection is unstable and i had to 
> write
> the post offline and just copy/paste it
> 
> On 2013-06-26 06:42, Pavel Troller wrote:
> Hi!
> So, I'm replying to my own original post, to keep the question and a
> possible answer together without any excessive or unrelated 
> information.
> I hope I've found the cause of the problem and I hope I solved it. A
> modified libss7 is now online and I'm waiting for busy hours to see,
> whether
> it will help.
> The problem is, that in the isup_rel() function, all the important
> got_sent_msg flags are cleared, so the stack "forgets" a preceding call
> state:
> ... isup_rel():
> c->got_sent_msg |= ISUP_SENT_REL;
> c->got_sent_msg &= ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
> ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
> ...
> So, an incoming MSU, which was perfectly legitimate before sending REL,
> is now handled as unexpected.
> My solution adds the following code to the isup_receive() function for
> every message, which can confuse the stack by the discovered cause
> (an example for ACM message):
> case ISUP_ACM:
> +                       if (c->got_sent_msg & ISUP_SENT_REL) {
> +                               ss7_message(ss7, "Got unexpected ACM
> after sending REL on CIC %d PC %d, ignoring ", c->cic, opc);
> +                               return 0;
> +                       }
> 
> if (!(c->got_sent_msg & ISUP_SENT_IAM)) {
> ss7_message(ss7, "Got ACM but we didn't send IAM on CIC %d PC %d ",
> c->cic, opc);
> return isup_handle_unexpected(ss7, c, opc);
> }
> 
> If my change will prove good, I'm planning to remove the ss7_message() 
> to
> limit the stack verbosity, as these situations are relatively frequent
> under
> heavy load and I think they are moreless logical and normal.
> 
> I would be glad for some words from the KNK branch maintainer(s), 
> whether
> to
> create a JIRA issue and put my patch there or how to proceed now in
> general.
> 
> With regards,
> Pavel
> 
> 
> 
> Hi!
> I would like to share my expiernce with deployment of this experimental
> SS7
> branch.
> The first impressions are good, especially the timers seem to work 
> well,
> saving many calls from being frozen.
> However, there are still some strange things, which I would like to
> discuss
> here, one by one.
> The first one is, that the channel sometimes doesn't recognize a 
> message
> (mostly RLC), even it comes from an action initiated by the channel
> itself.
> Typically, the following is appearing often:
> 
> [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1]
> ISUP timer t17 expired on CIC 27 DPC 4097
> [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the 
> cic
> 
> As I understand, there were some timeouts and now the channel tries to
> recover by sending RSC and firing T17. However, it seems that it
> immediately
> rejects RLC, which comes back as a response to the RSC which was just 
> sent
> upon expiry of T17. And this appears again and again in the rhythm of 
> T17,
> and the channel is not operational.
> ss7 show calls shows the following line for the misbehaving CIC:
> 27  4097  11  IAM                       IAM
> 
> Or, a very similar situation:
> [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
> [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the 
> CIC
> 
> The first question is, why there was no call while SUS was received. My
> idea is, that both the parties hung up their phones in the same time 
> and
> that the call was undergoing destruction on Asterisk side (REL just 
> sent
> or something like this), while SUS arrived. Maybe the call was marked 
> as
> cleared even before RLC came back ? OK, I can understand this. But
> if the CIC was reset as the first message says (i.e. RSC was sent), why
> the
> RLC going back is not recognized then ?
> 
> Or, just now the following appeared:
> 
> [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic
> [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the 
> cic
> 
> Again, it's questionable, why this happened, but the second line seems
> to indicate some brokeness again.
> 
> To explain: The channel is operating on a gateway equipped with 16 E1s
> and current traffic is about 10 CAPS, there are two linksets to two
> cooperating exchanges. They are EWSDs, which have very mature and 
> stable
> SS7, so I'm almost sure that they are not making signalling errors.
> 
> With regards,
> Pavel
> 
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> 
> asterisk-ss7 mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-ss7
> 
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> 
> asterisk-ss7 mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-ss7
> 
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> 
> asterisk-ss7 mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-ss7
> 
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> 
> asterisk-ss7 mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-ss7



[Index of Archives]     [Asterisk App Development]     [PJ SIP]     [Gnu Gatekeeper]     [IETF Sipping]     [Info Cyrus]     [ALSA User]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Backpacking]     [Deep Creek Hot Springs]     [Yosemite Campsites]     [ISDN Cause Codes]     [Asterisk Books]

  Powered by Linux