Hello Marcelo, > What code are you using ? > Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that. No, it's not stock libss7. It's written in the subject, as well as in my first sentence in the first post. It's a special branch, available for both Asterisk and libsss7 (version 2), which must be applied together. And when I started playing with it, I was told that I can post my experiences/problems to this ML, which is what I just did. But you are still the only one who responded to me. > > In my code, I explicitly ignore ALL SUS / RES, they have no needed > processing associated with Brazilian ISUP. We have SUS/RES in Czech Republic in the national ISUP spec, so we must handle it properly. However, it's not just problem with SUS, this problem may appear at any time, when A-side clears down, while a MSU from B-side (any obvious MSU like ACM, ANM, CON, CPG...) is already underway. > > Asterisk and kernel dahdi version ? Asterisk 11 branch, dahdi kernel the last state available from SVN (they now moved to git and I still didn't adapt my working copy, as I also have many private patches in it and it will be a pain to incorporate them to my local git repo). > > If you enable dahdi_pcap: > # dahdi_pcap -c 16 -f /tmp/mycap.ss7 > Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7 > Packets captured: 7 > > Then you can analyze the capture in wireshark / ethereal. > But it has one bug, if you shutdown the owner of the link while > dahdi_pcap is running, the system will reset on its own. > As long as you don't leave dahdi_pcap running around, its not a problem. A good hint, I really didn't know about it! Thanks, I will use it (with care to prevent system crash :-) ). With regards, Pavel > > > > On 06/25/13 09:38, Pavel Troller wrote: > > Hello Marcelo, > > so I did some tracing. It was really hard to isolate MSUs for one particular > > connection, I had to collect them from about 5 MB file, but ok, it's done, > > and it's in total harmony with my original ideas. So, let's look at it with > > me: > > > > Initial conditions: There is a call running on LS1, DPC4097, CIC 12. > > > > Our Asterisk decided to clear this call down: > > > > [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097 > > [1] ISUP timer t5 (300000ms) started on CIC 12 DPC 4097 > > [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ] > > [1] FSN: 67 FIB 1 > > [1] BSN: 60 BIB 1 > > [1] >[4097:0] MSU > > [1] [ bc c3 0d ] > > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > > [1] [ 85 ] > > [1] OPC 8 DPC 4097 SLS 12 > > [1] [ 01 10 02 c0 ] > > [1] CIC: 12 > > [1] [ 0c 00 ] > > [1] Message Type: REL(0x0c) > > [1] [ 0c ] > > [1] --VARIABLE LENGTH PARMS[1]-- > > [1] Cause Indicator: > > [1] Coding Standard: 0 > > [1] Location: 1 > > [1] Cause Class: 1 > > [1] Cause Subclass: 0 > > [1] Cause: Normal call clearing (16) > > [1] [ 02 81 90 ] > > [1] > > > > But, the remote party also decided to hang up, and our REL just crossed > > their SUS going back (please look at BSN and compare with our FSN, they > > don't know about our REL yet). > > > > [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ] > > [1] FSN: 61 FIB 1 > > [1] BSN: 64 BIB 1 > > [1] <[4097:0] MSU > > [1] [ c0 bd 0a ] > > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > > [1] [ 85 ] > > [1] OPC 4097 DPC 8 SLS 12 > > [1] [ 08 40 00 c4 ] > > [1] CIC: 12 > > [1] [ 0c 00 ] > > [1] Message Type: SUS(0x0d) > > [1] [ 0d ] > > [1] --FIXED LENGTH PARMS[1]-- > > [1] Suspend/Resume Indicators: > > [1] SUS/RES indicator: Network initiated (1)?[1] [ 01 ] > > [1] > > > > And what happens now is a clear ******** BUG ******** in libss7: As RLC > > has not been received yet, the call must still be considered as active! > > But we already forgot it and now we are surprised that we got some MSU > > about it. > > > > [1] Got SUS but no call on CIC 12 PC 4097 ?[1] reseting the cic > > > > The situation is getting complicated, we are sending RSC. > > > > [1] ISUP timer t1 stopped on CIC 12 DPC: 4097 > > [1] ISUP timer t5 stopped on CIC 12 DPC: 4097 > > [1] ISUP timer t17 (300000ms) started on CIC 12 DPC 4097 > > [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ] > > [1] FSN: 68 FIB 1 > > [1] BSN: 61 BIB 1 > > [1] >[4097:0] MSU > > [1] [ bd c4 08 ] > > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > > [1] [ 85 ] > > [1] OPC 8 DPC 4097 SLS 12 > > [1] [ 01 10 02 c0 ] > > [1] CIC: 12 > > [1] [ 0c 00 ] > > [1] Message Type: RSC(0x12) > > [1] [ 12 ] > > > > And we get a RLC. IMHO it is a RLC confirming our REL, not > > RSC (according to BSN, the peer already received all our MSUs, > > but they probably already had the RLC queued, so they sent it) > > > > [1] > > [1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ] > > [1] FSN: 62 FIB 1 > > [1] BSN: 68 BIB 1 > > [1] <[4097:0] MSU > > [1] [ c4 be 09 ] > > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > > [1] [ 85 ] > > [1] OPC 4097 DPC 8 SLS 12 > > [1] [ 08 40 00 c4 ] > > [1] CIC: 12 > > [1] [ 0c 00 ] > > [1] Message Type: RLC(0x10) > > [1] [ 10 ] > > [1] > > [1] ISUP timer t17 stopped on CIC 12 DPC: 4097 > > Linkset 1: Processing event: ISUP_EVENT_RLC > > > > And now, we get a second RLC, probably to our RSC. There is a jump > > in FSN because there was a MSU sent from them, which was not > > related to our call. > > > > [1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ] > > [1] FSN: 64 FIB 1 > > [1] BSN: 68 BIB 1 > > [1] <[4097:0] MSU > > [1] [ c4 c0 09 ] > > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > > [1] [ 85 ] > > [1] OPC 4097 DPC 8 SLS 12 > > [1] [ 08 40 00 c4 ] > > [1] CIC: 12 > > [1] [ 0c 00 ] > > [1] Message Type: RLC(0x10) > > [1] [ 10 ] > > [1] > > > > And this RLC seems unsolicited to us, because we were taking the > > first RLC as a response to our RSC, which was not the case. > > > > [1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 > > > > So, no MSUs received from another linksets, all is perfectly fitting > > together... > > > > This trace is a clear demonstration of an existing bug in libss7, which > > may be formulated as follows: "When we are terminating the call and sending > > REL to the remote party, we must keep the record of the connection and > > silently accept and absorb all MSUs, which may come back, until we receive > > a RLC or T5 expires". > > > > What do you think about it ? > > > > With regards, > > Pavel > > > >> Another possibility is you're mixing the whole thing in a single linkset > >> where you must use two linksets in the way you explained. > >> > >> Can you see those errors with just a few test calls ? > >> > >> > >> I found about 20 bugs / structural design flaws in stock libss7 / dahdi > >> mtp2 support. With my changes the mtp2/mtp3 layers are far more robust > >> than stock libss7. > >> Fixed all but a single one, related to knowing then the linkset is up or > >> down, and not trying to send isup messages, specially IAM through a down > >> linkset - all sigchans down. > >> > >> If there's a bug, use ss7 set debug on linkset X to trace ss7 messages > >> and track isup message flow. > >> > >> I used libss7 succesfully with telcobridges tmedia, digitro switches, > >> ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and > >> I'm probably missing a couple switch types. > >> I never ran into SS7 / ISUP bugs of other switches, always libss7, but, > >> the nature of the bugs found are nothing like what you're reporting. > >> I started testing libss7 with those kinds of switches 5 years ago, so I > >> have a some mileage to make those statements, specially from reading and > >> understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code. > >> > >> The issue you're describing is caused by Asterisk getting ss7 messages > >> that belong to another linkset or sending ss7 messages on the wrong ss7 > >> link. > >> Check for UCIC or CFN ISUP responses. > >> > >> > >> > >> you need to define chan_dahdi.conf basicly like this: > >> > >> ; basic ss7 / isup parameters, usually the same for the whole libss7 setup > >> signalling=ss7 > >> ss7type=itu/ansi > >> ss7_called_nai=subscriber/national/international/unknown > >> ss7_calling_nai=subscriber/national/international/unknown > >> networkindicator=national/international/... > >> > >> ; Your local pointcode > >> pointcode = X > >> > >> ; Start definition for linkset N > >> linkset = N > >> > >> adjpointcode = STP point code otherwise switch point code > >> ; Instantiate a signalling link on channel 16 belonging to linkset N, > >> with adjacency to adjpointcode > >> sigchan = 16 > >> ; Define more signalling links if needed, with adjpointcode and sigchan > >> > >> defaultdpc = pointcode for ISUP messages > >> cicbeginswith= CIC of the next voice channel defined > >> ; Instantiate voice channel on linkset N, talking to PC defaultdpc, CIC > >> numbering incremented automatically > >> channel => dahdi channel range > >> > >> cicbeginswith= next CIC range, if non contiguous > >> channel => dahdi channel range > >> > >> defaultdpc = another point code belonging to the same linkset (if links > >> share signalling to multiple switches, typically links through an STP) > >> ;repeat cicbeginswith, channel > >> > >> ; Starts definition of another linkset > >> linkset = M > >> ; repeat same sequence as above > >> > >> > >> On 06/25/13 05:13, Pavel Troller wrote: > >>> Hello Marcelo, > >>> > >>>> Per usual, read the fine manual. Wait, there's no manual ! > >>> You're right :-). > >>> > >>>> Since you seem to have done your part and actually knows some ss7 and > >>>> isup, here comes a hint. > >>>> > >>>> You created two or more linksets where you must have a single one. > >>>> libss7 don't have the ss7 routing feature. > >>> It seems strange to me. Let's try to explain this in more detailed way. > >>> There is 1 (one) Asterisk box. > >>> It has 2 (two) "linksets" configured, with 1 (one) signallink link per linkset. > >>> Linkset 1 is configured for one DPC and with CICs 1 - 496. > >>> Linkset 2 is configured for another (different) DPC and also with CICs 1 - 496. > >>> Both the systems connected to this Asterisk box are configured to respond > >>> directly to the linkset between them and the Asterisk, so it's sure that > >>> a MSU from DPC1 cannot come over LS2 and vice versa. > >>> I hope that this extremely simple setup is in the scope of current libss7 > >>> functionality. Or am I wrong ? > >>> > >>>> In libss7 linkset concept is diferent from official ss7 linkset. > >>>> > >>>> All signalling links that carry ISUP traffic for a given set of channels > >>>> must be kept on a single linkset, as well as all ISUP channels that go > >>>> through those links. > >>> I hope that my setup is conformant with this limitation. > >>> > >>>> It looks like you're getting incoming signalling for ISUP channels that > >>>> are on another linkset. > >>> It really looks like this, but I still hope it's not the case. Please note that > >>> the traffic on the box is rather high, such an error occurs for one of, say, > >>> 10000 call attempts. I think that in case of such a fatal routing problem, > >>> which you are talking about, it wouldn't be possible to use the system > >>> regularly. > >>> > >>>> I'm sure you didn't find any libss7 bug. > >>> Really strong words! I wouldn't say it for any of my programs :-). > >>> > >>>> I have a highly customized version of libss7/dahdi/asterisk, fixing lots > >>>> of issue, but this isn't one of them. > >>> Possibly your setup/usage scenario is a bit different ? > >>> > >>> > >>>> Processed over one million call setups, with a very complex setup (6 > >>>> linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote > >>>> switches using my simple STP solution, sharing the local links over SS7 > >>>> over UDP - my simpler proprietary alternative to sigtran). > >>> These switches (I have two of them, but the second one is still on a regular > >>> unpatched SS7 stack) make approx. 3 millions of call setups per week. My > >>> record (without restarting/crashing Asterisk) is about 3 weeks with more than > >>> 10 millions of calls. > >>> > >>>> If you need commercial support, contact me off list. > >>> Thanks for your offer. > >>> > >>> With regards, Pavel > >>> > >>>> On 06/24/13 09:02, Pavel Troller wrote: > >>>>> Hi! > >>>>> I would like to share my expiernce with deployment of this experimental SS7 > >>>>> branch. > >>>>> The first impressions are good, especially the timers seem to work well, > >>>>> saving many calls from being frozen. > >>>>> However, there are still some strange things, which I would like to discuss > >>>>> here, one by one. > >>>>> The first one is, that the channel sometimes doesn't recognize a message > >>>>> (mostly RLC), even it comes from an action initiated by the channel itself. > >>>>> Typically, the following is appearing often: > >>>>> > >>>>> [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097 > >>>>> [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic > >>>>> > >>>>> As I understand, there were some timeouts and now the channel tries to > >>>>> recover by sending RSC and firing T17. However, it seems that it immediately > >>>>> rejects RLC, which comes back as a response to the RSC which was just sent > >>>>> upon expiry of T17. And this appears again and again in the rhythm of T17, > >>>>> and the channel is not operational. > >>>>> ss7 show calls shows the following line for the misbehaving CIC: > >>>>> 27 4097 11 IAM IAM > >>>>> > >>>>> Or, a very similar situation: > >>>>> [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC > >>>>> [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC > >>>>> > >>>>> The first question is, why there was no call while SUS was received. My > >>>>> idea is, that both the parties hung up their phones in the same time and > >>>>> that the call was undergoing destruction on Asterisk side (REL just sent > >>>>> or something like this), while SUS arrived. Maybe the call was marked as > >>>>> cleared even before RLC came back ? OK, I can understand this. But > >>>>> if the CIC was reset as the first message says (i.e. RSC was sent), why the > >>>>> RLC going back is not recognized then ? > >>>>> > >>>>> Or, just now the following appeared: > >>>>> > >>>>> [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic > >>>>> [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic > >>>>> > >>>>> Again, it's questionable, why this happened, but the second line seems > >>>>> to indicate some brokeness again. > >>>>> > >>>>> To explain: The channel is operating on a gateway equipped with 16 E1s > >>>>> and current traffic is about 10 CAPS, there are two linksets to two > >>>>> cooperating exchanges. They are EWSDs, which have very mature and stable > >>>>> SS7, so I'm almost sure that they are not making signalling errors. > >>>>> > >>>>> With regards, > >>>>> Pavel > >>>>> > >>>>> -- > >>>>> _____________________________________________________________________ > >>>>> -- Bandwidth and Colocation Provided by http://www.api-digital.com -- > >>>>> > >>>>> asterisk-ss7 mailing list > >>>>> To UNSUBSCRIBE or update options visit: > >>>>> http://lists.digium.com/mailman/listinfo/asterisk-ss7 > >>>>> > >> > >> -- > >> Atenciosamente, > >> > >> Marcelo Pacheco > >> M2J Comunica??es e Inform?tica > >> Fixo: (27)2222-8118 / (27)2233-2296 > >> Vivo: (27)9964-5440 > >> Claro: (27)9312-5319 > >> MSN: marcelo at macp.eti.br > >> E-mail: marcelo at m2j.com.br > > > -- > Atenciosamente, > > Marcelo Pacheco > M2J Comunica???es e Inform?tica > Fixo: (27)2222-8118 / (27)2233-2296 > Vivo: (27)9964-5440 > Claro: (27)9312-5319 > MSN: marcelo at macp.eti.br > E-mail: marcelo at m2j.com.br