KNK SS7-27 - first experiences - part 1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Marcelo,

> What code are you using ?
> Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that.

No, it's not stock libss7. It's written in the subject, as well as in my first
sentence in the first post. It's a special branch, available for both Asterisk
and libsss7 (version 2), which must be applied together. And when I started
playing with it, I was told that I can post my experiences/problems to this ML,
which is what I just did. But you are still the only one who responded to me.

> 
> In my code, I explicitly ignore ALL SUS / RES, they have no needed
> processing associated with Brazilian ISUP.

We have SUS/RES in Czech Republic in the national ISUP spec, so we must
handle it properly. However, it's not just problem with SUS, this problem
may appear at any time, when A-side clears down, while a MSU from B-side
(any obvious MSU like ACM, ANM, CON, CPG...) is already underway. 

> 
> Asterisk and kernel dahdi version ?

Asterisk 11 branch, dahdi kernel the last state available from SVN (they now
moved to git and I still didn't adapt my working copy, as I also have many
private patches in it and it will be a pain to incorporate them to my local
git repo).

> 
> If you enable dahdi_pcap:
> # dahdi_pcap -c 16 -f /tmp/mycap.ss7
> Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7
> Packets captured: 7
> 
> Then you can analyze the capture in wireshark / ethereal.
> But it has one bug, if you shutdown the owner of the link while
> dahdi_pcap is running, the system will reset on its own.
> As long as you don't leave dahdi_pcap running around, its not a problem.

A good hint, I really didn't know about it! Thanks, I will use it (with care
to prevent system crash :-) ).

With regards,
  Pavel


> 
> 
> 
> On 06/25/13 09:38, Pavel Troller wrote:
> > Hello Marcelo,
> >   so I did some tracing. It was really hard to isolate MSUs for one particular
> > connection, I had to collect them from about 5 MB file, but ok, it's done,
> > and it's in total harmony with my original ideas. So, let's look at it with
> > me:
> >
> > Initial conditions: There is a call running on LS1, DPC4097, CIC 12.
> >
> > Our Asterisk decided to clear this call down:
> >
> > [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097
> > [1] ISUP timer t5 (300000ms) started on CIC 12 DPC 4097
> > [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ]
> > [1] FSN: 67 FIB 1
> > [1] BSN: 60 BIB 1
> > [1] >[4097:0] MSU
> > [1] [ bc c3 0d ]
> > [1] 	Network Indicator: 2 Priority: 0 User Part: ISUP (5)
> > [1] 	[ 85 ]
> > [1] 	OPC 8 DPC 4097 SLS 12
> > [1] 	[ 01 10 02 c0 ]
> > [1] 		CIC: 12
> > [1] 		[ 0c 00 ]
> > [1] 		Message Type: REL(0x0c)
> > [1] 		[ 0c ]
> > [1] 		--VARIABLE LENGTH PARMS[1]--
> > [1] 		Cause Indicator:
> > [1] 			Coding Standard: 0
> > [1] 			Location: 1
> > [1] 			Cause Class: 1
> > [1] 			Cause Subclass: 0
> > [1] 			Cause: Normal call clearing (16)
> > [1] 			[ 02 81 90 ]
> > [1] 
> >
> > But, the remote party also decided to hang up, and our REL just crossed
> > their SUS going back (please look at BSN and compare with our FSN, they
> > don't know about our REL yet).
> >
> > [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ]
> > [1] FSN: 61 FIB 1
> > [1] BSN: 64 BIB 1
> > [1] <[4097:0] MSU
> > [1] [ c0 bd 0a ]
> > [1] 	Network Indicator: 2 Priority: 0 User Part: ISUP (5)
> > [1] 	[ 85 ]
> > [1] 	OPC 4097 DPC 8 SLS 12
> > [1] 	[ 08 40 00 c4 ]
> > [1] 		CIC: 12
> > [1] 		[ 0c 00 ]
> > [1] 		Message Type: SUS(0x0d)
> > [1] 		[ 0d ]
> > [1] 		--FIXED LENGTH PARMS[1]--
> > [1] 		Suspend/Resume Indicators:
> > [1] 			SUS/RES indicator: Network initiated (1)?[1] 			[ 01 ]
> > [1] 
> >
> > And what happens now is a clear ******** BUG ******** in libss7: As RLC
> > has not been received yet, the call must still be considered as active!
> > But we already forgot it and now we are surprised that we got some MSU
> > about it.
> >
> > [1] Got SUS but no call on CIC 12 PC 4097 ?[1] reseting the cic
> >
> > The situation is getting complicated, we are sending RSC.
> >
> > [1] ISUP timer t1 stopped on CIC 12 DPC: 4097
> > [1] ISUP timer t5 stopped on CIC 12 DPC: 4097
> > [1] ISUP timer t17 (300000ms) started on CIC 12 DPC 4097
> > [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ]
> > [1] FSN: 68 FIB 1
> > [1] BSN: 61 BIB 1
> > [1] >[4097:0] MSU
> > [1] [ bd c4 08 ]
> > [1] 	Network Indicator: 2 Priority: 0 User Part: ISUP (5)
> > [1] 	[ 85 ]
> > [1] 	OPC 8 DPC 4097 SLS 12
> > [1] 	[ 01 10 02 c0 ]
> > [1] 		CIC: 12
> > [1] 		[ 0c 00 ]
> > [1] 		Message Type: RSC(0x12)
> > [1] 		[ 12 ]
> >
> > And we get a RLC. IMHO it is a RLC confirming our REL, not
> > RSC (according to BSN, the peer already received all our MSUs,
> > but they probably already had the RLC queued, so they sent it)
> >
> > [1] 
> > [1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ]
> > [1] FSN: 62 FIB 1
> > [1] BSN: 68 BIB 1
> > [1] <[4097:0] MSU
> > [1] [ c4 be 09 ]
> > [1] 	Network Indicator: 2 Priority: 0 User Part: ISUP (5)
> > [1] 	[ 85 ]
> > [1] 	OPC 4097 DPC 8 SLS 12
> > [1] 	[ 08 40 00 c4 ]
> > [1] 		CIC: 12
> > [1] 		[ 0c 00 ]
> > [1] 		Message Type: RLC(0x10)
> > [1] 		[ 10 ]
> > [1] 
> > [1] ISUP timer t17 stopped on CIC 12 DPC: 4097
> > Linkset 1: Processing event: ISUP_EVENT_RLC
> >
> > And now, we get a second RLC, probably to our RSC. There is a jump
> > in FSN because there was a MSU sent from them, which was not
> > related to our call.
> >
> > [1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ]
> > [1] FSN: 64 FIB 1
> > [1] BSN: 68 BIB 1
> > [1] <[4097:0] MSU
> > [1] [ c4 c0 09 ]
> > [1] 	Network Indicator: 2 Priority: 0 User Part: ISUP (5)
> > [1] 	[ 85 ]
> > [1] 	OPC 4097 DPC 8 SLS 12
> > [1] 	[ 08 40 00 c4 ]
> > [1] 		CIC: 12
> > [1] 		[ 0c 00 ]
> > [1] 		Message Type: RLC(0x10)
> > [1] 		[ 10 ]
> > [1] 
> >
> > And this RLC seems unsolicited to us, because we were taking the
> > first RLC as a response to our RSC, which was not the case.
> >
> > [1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 
> >
> > So, no MSUs received from another linksets, all is perfectly fitting
> > together...
> >
> > This trace is a clear demonstration of an existing bug in libss7, which
> > may be formulated as follows: "When we are terminating the call and sending
> > REL to the remote party, we must keep the record of the connection and 
> > silently accept and absorb all MSUs, which may come back, until we receive
> > a RLC or T5 expires".
> >
> > What do you think about it ?
> >
> > With regards,
> >   Pavel
> >
> >> Another possibility is you're mixing the whole thing in a single linkset
> >> where you must use two linksets in the way you explained.
> >>
> >> Can you see those errors with just a few test calls ?
> >>
> >>
> >> I found about 20 bugs / structural design flaws in stock libss7 / dahdi
> >> mtp2 support. With my changes the mtp2/mtp3 layers are far more robust
> >> than stock libss7.
> >> Fixed all but a single one, related to knowing then the linkset is up or
> >> down, and not trying to send isup messages, specially IAM through a down
> >> linkset - all sigchans down.
> >>
> >> If there's a bug, use ss7 set debug on linkset X to trace ss7 messages
> >> and track isup message flow.
> >>
> >> I used libss7 succesfully with telcobridges tmedia, digitro switches,
> >> ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and
> >> I'm probably missing a couple switch types.
> >> I never ran into SS7 / ISUP bugs of other switches, always libss7, but,
> >> the nature of the bugs found are nothing like what you're reporting.
> >> I started testing libss7 with those kinds of switches 5 years ago, so I
> >> have a some mileage to make those statements, specially from reading and
> >> understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code.
> >>
> >> The issue you're describing is caused by Asterisk getting ss7 messages
> >> that belong to another linkset or sending ss7 messages on the wrong ss7
> >> link.
> >> Check for UCIC or CFN ISUP responses.
> >>
> >>
> >>
> >> you need to define chan_dahdi.conf basicly like this:
> >>
> >> ; basic ss7 / isup parameters, usually the same for the whole libss7 setup
> >> signalling=ss7
> >> ss7type=itu/ansi
> >> ss7_called_nai=subscriber/national/international/unknown
> >> ss7_calling_nai=subscriber/national/international/unknown
> >> networkindicator=national/international/...
> >>
> >> ; Your local pointcode
> >> pointcode = X
> >>
> >> ; Start definition for linkset N
> >> linkset = N
> >>
> >> adjpointcode = STP point code otherwise switch point code
> >> ; Instantiate a signalling link on channel 16 belonging to linkset N,
> >> with adjacency to adjpointcode
> >> sigchan = 16
> >> ; Define more signalling links if needed, with adjpointcode and sigchan
> >>
> >> defaultdpc = pointcode for ISUP messages
> >> cicbeginswith= CIC of the next voice channel defined
> >> ; Instantiate voice channel on linkset N, talking to PC defaultdpc, CIC
> >> numbering incremented automatically
> >> channel => dahdi channel range
> >>
> >> cicbeginswith= next CIC range, if non contiguous
> >> channel => dahdi channel range
> >>
> >> defaultdpc = another point code belonging to the same linkset (if links
> >> share signalling to multiple switches, typically links through an STP)
> >> ;repeat cicbeginswith, channel
> >>
> >> ; Starts definition of another linkset
> >> linkset = M
> >> ; repeat same sequence as above
> >>
> >>
> >> On 06/25/13 05:13, Pavel Troller wrote:
> >>> Hello Marcelo,
> >>>
> >>>> Per usual, read the fine manual. Wait, there's no manual !
> >>> You're right :-).
> >>>
> >>>> Since you seem to have done your part and actually knows some ss7 and
> >>>> isup, here comes a hint.
> >>>>
> >>>> You created two or more linksets where you must have a single one.
> >>>> libss7 don't have the ss7 routing feature.
> >>> It seems strange to me. Let's try to explain this in more detailed way.
> >>> There is 1 (one) Asterisk box.
> >>> It has 2 (two) "linksets" configured, with 1 (one) signallink link per linkset.
> >>> Linkset 1 is configured for one DPC and with CICs 1 - 496.
> >>> Linkset 2 is configured for another (different) DPC and also with CICs 1 - 496.
> >>> Both the systems connected to this Asterisk box are configured to respond
> >>> directly to the linkset between them and the Asterisk, so it's sure that
> >>> a MSU from DPC1 cannot come over LS2 and vice versa.
> >>> I hope that this extremely simple setup is in the scope of current libss7
> >>> functionality. Or am I wrong ?
> >>>
> >>>> In libss7 linkset concept is diferent from official ss7 linkset.
> >>>>
> >>>> All signalling links that carry ISUP traffic for a given set of channels
> >>>> must be kept on a single linkset, as well as all ISUP channels that go
> >>>> through those links.
> >>> I hope that my setup is conformant with this limitation.
> >>>
> >>>> It looks like you're getting incoming signalling for ISUP channels that
> >>>> are on another linkset.
> >>> It really looks like this, but I still hope it's not the case. Please note that
> >>> the traffic on the box is rather high, such an error occurs for one of, say,
> >>> 10000 call attempts. I think that in case of such a fatal routing problem,
> >>> which you are talking about, it wouldn't be possible to use the system
> >>> regularly.
> >>>
> >>>> I'm sure you didn't find any libss7 bug.
> >>> Really strong words! I wouldn't say it for any of my programs :-).
> >>>
> >>>> I have a highly customized version of libss7/dahdi/asterisk, fixing lots
> >>>> of issue, but this isn't one of them.
> >>> Possibly your setup/usage scenario is a bit different ?
> >>>
> >>>
> >>>> Processed over one million call setups, with a very complex setup (6
> >>>> linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote
> >>>> switches using my simple STP solution, sharing the local links over SS7
> >>>> over UDP - my simpler proprietary alternative to sigtran).
> >>> These switches (I have two of them, but the second one is still on a regular
> >>> unpatched SS7 stack) make approx. 3 millions of call setups per week. My
> >>> record (without restarting/crashing Asterisk) is about 3 weeks with more than
> >>> 10 millions of calls.
> >>>
> >>>> If you need commercial support, contact me off list.
> >>> Thanks for your offer.
> >>>
> >>> With regards, Pavel
> >>>
> >>>> On 06/24/13 09:02, Pavel Troller wrote:
> >>>>> Hi!
> >>>>>   I would like to share my expiernce with deployment of this experimental SS7
> >>>>> branch.
> >>>>>   The first impressions are good, especially the timers seem to work well,
> >>>>> saving many calls from being frozen.
> >>>>>   However, there are still some strange things, which I would like to discuss
> >>>>> here, one by one.
> >>>>>   The first one is, that the channel sometimes doesn't recognize a message
> >>>>> (mostly RLC), even it comes from an action initiated by the channel itself.
> >>>>> Typically, the following is appearing often:
> >>>>>
> >>>>> [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097
> >>>>> [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic
> >>>>>
> >>>>>   As I understand, there were some timeouts and now the channel tries to
> >>>>> recover by sending RSC and firing T17. However, it seems that it immediately
> >>>>> rejects RLC, which comes back as a response to the RSC which was just sent
> >>>>> upon expiry of T17. And this appears again and again in the rhythm of T17,
> >>>>> and the channel is not operational.
> >>>>> ss7 show calls shows the following line for the misbehaving CIC:
> >>>>>    27  4097  11  IAM                       IAM
> >>>>>  
> >>>>>   Or, a very similar situation:
> >>>>> [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
> >>>>> [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC
> >>>>>
> >>>>>   The first question is, why there was no call while SUS was received. My
> >>>>> idea is, that both the parties hung up their phones in the same time and
> >>>>> that the call was undergoing destruction on Asterisk side (REL just sent
> >>>>> or something like this), while SUS arrived. Maybe the call was marked as
> >>>>> cleared even before RLC came back ? OK, I can understand this. But
> >>>>> if the CIC was reset as the first message says (i.e. RSC was sent), why the
> >>>>> RLC going back is not recognized then ?
> >>>>>
> >>>>> Or, just now the following appeared:
> >>>>>
> >>>>> [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic
> >>>>> [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic
> >>>>>
> >>>>> Again, it's questionable, why this happened, but the second line seems
> >>>>> to indicate some brokeness again.
> >>>>>
> >>>>> To explain: The channel is operating on a gateway equipped with 16 E1s
> >>>>> and current traffic is about 10 CAPS, there are two linksets to two
> >>>>> cooperating exchanges. They are EWSDs, which have very mature and stable
> >>>>> SS7, so I'm almost sure that they are not making signalling errors.
> >>>>>
> >>>>> With regards,
> >>>>>   Pavel
> >>>>>
> >>>>> --
> >>>>> _____________________________________________________________________
> >>>>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> >>>>>
> >>>>> asterisk-ss7 mailing list
> >>>>> To UNSUBSCRIBE or update options visit:
> >>>>>    http://lists.digium.com/mailman/listinfo/asterisk-ss7
> >>>>>
> >>
> >> -- 
> >> Atenciosamente,
> >>
> >> Marcelo Pacheco
> >> M2J Comunica??es e Inform?tica
> >> Fixo: (27)2222-8118 / (27)2233-2296
> >> Vivo: (27)9964-5440
> >> Claro: (27)9312-5319
> >> MSN: marcelo at macp.eti.br
> >> E-mail: marcelo at m2j.com.br
> 
> 
> -- 
> Atenciosamente,
> 
> Marcelo Pacheco
> M2J Comunica???es e Inform?tica
> Fixo: (27)2222-8118 / (27)2233-2296
> Vivo: (27)9964-5440
> Claro: (27)9312-5319
> MSN: marcelo at macp.eti.br
> E-mail: marcelo at m2j.com.br



[Index of Archives]     [Asterisk App Development]     [PJ SIP]     [Gnu Gatekeeper]     [IETF Sipping]     [Info Cyrus]     [ALSA User]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Backpacking]     [Deep Creek Hot Springs]     [Yosemite Campsites]     [ISDN Cause Codes]     [Asterisk Books]

  Powered by Linux