What code are you using ? Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that. In my code, I explicitly ignore ALL SUS / RES, they have no needed processing associated with Brazilian ISUP. Asterisk and kernel dahdi version ? If you enable dahdi_pcap: # dahdi_pcap -c 16 -f /tmp/mycap.ss7 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7 Packets captured: 7 Then you can analyze the capture in wireshark / ethereal. But it has one bug, if you shutdown the owner of the link while dahdi_pcap is running, the system will reset on its own. As long as you don't leave dahdi_pcap running around, its not a problem. On 06/25/13 09:38, Pavel Troller wrote: > Hello Marcelo, > so I did some tracing. It was really hard to isolate MSUs for one particular > connection, I had to collect them from about 5 MB file, but ok, it's done, > and it's in total harmony with my original ideas. So, let's look at it with > me: > > Initial conditions: There is a call running on LS1, DPC4097, CIC 12. > > Our Asterisk decided to clear this call down: > > [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097 > [1] ISUP timer t5 (300000ms) started on CIC 12 DPC 4097 > [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ] > [1] FSN: 67 FIB 1 > [1] BSN: 60 BIB 1 > [1] >[4097:0] MSU > [1] [ bc c3 0d ] > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > [1] [ 85 ] > [1] OPC 8 DPC 4097 SLS 12 > [1] [ 01 10 02 c0 ] > [1] CIC: 12 > [1] [ 0c 00 ] > [1] Message Type: REL(0x0c) > [1] [ 0c ] > [1] --VARIABLE LENGTH PARMS[1]-- > [1] Cause Indicator: > [1] Coding Standard: 0 > [1] Location: 1 > [1] Cause Class: 1 > [1] Cause Subclass: 0 > [1] Cause: Normal call clearing (16) > [1] [ 02 81 90 ] > [1] > > But, the remote party also decided to hang up, and our REL just crossed > their SUS going back (please look at BSN and compare with our FSN, they > don't know about our REL yet). > > [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ] > [1] FSN: 61 FIB 1 > [1] BSN: 64 BIB 1 > [1] <[4097:0] MSU > [1] [ c0 bd 0a ] > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > [1] [ 85 ] > [1] OPC 4097 DPC 8 SLS 12 > [1] [ 08 40 00 c4 ] > [1] CIC: 12 > [1] [ 0c 00 ] > [1] Message Type: SUS(0x0d) > [1] [ 0d ] > [1] --FIXED LENGTH PARMS[1]-- > [1] Suspend/Resume Indicators: > [1] SUS/RES indicator: Network initiated (1)?[1] [ 01 ] > [1] > > And what happens now is a clear ******** BUG ******** in libss7: As RLC > has not been received yet, the call must still be considered as active! > But we already forgot it and now we are surprised that we got some MSU > about it. > > [1] Got SUS but no call on CIC 12 PC 4097 ?[1] reseting the cic > > The situation is getting complicated, we are sending RSC. > > [1] ISUP timer t1 stopped on CIC 12 DPC: 4097 > [1] ISUP timer t5 stopped on CIC 12 DPC: 4097 > [1] ISUP timer t17 (300000ms) started on CIC 12 DPC 4097 > [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ] > [1] FSN: 68 FIB 1 > [1] BSN: 61 BIB 1 > [1] >[4097:0] MSU > [1] [ bd c4 08 ] > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > [1] [ 85 ] > [1] OPC 8 DPC 4097 SLS 12 > [1] [ 01 10 02 c0 ] > [1] CIC: 12 > [1] [ 0c 00 ] > [1] Message Type: RSC(0x12) > [1] [ 12 ] > > And we get a RLC. IMHO it is a RLC confirming our REL, not > RSC (according to BSN, the peer already received all our MSUs, > but they probably already had the RLC queued, so they sent it) > > [1] > [1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ] > [1] FSN: 62 FIB 1 > [1] BSN: 68 BIB 1 > [1] <[4097:0] MSU > [1] [ c4 be 09 ] > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > [1] [ 85 ] > [1] OPC 4097 DPC 8 SLS 12 > [1] [ 08 40 00 c4 ] > [1] CIC: 12 > [1] [ 0c 00 ] > [1] Message Type: RLC(0x10) > [1] [ 10 ] > [1] > [1] ISUP timer t17 stopped on CIC 12 DPC: 4097 > Linkset 1: Processing event: ISUP_EVENT_RLC > > And now, we get a second RLC, probably to our RSC. There is a jump > in FSN because there was a MSU sent from them, which was not > related to our call. > > [1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ] > [1] FSN: 64 FIB 1 > [1] BSN: 68 BIB 1 > [1] <[4097:0] MSU > [1] [ c4 c0 09 ] > [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) > [1] [ 85 ] > [1] OPC 4097 DPC 8 SLS 12 > [1] [ 08 40 00 c4 ] > [1] CIC: 12 > [1] [ 0c 00 ] > [1] Message Type: RLC(0x10) > [1] [ 10 ] > [1] > > And this RLC seems unsolicited to us, because we were taking the > first RLC as a response to our RSC, which was not the case. > > [1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 > > So, no MSUs received from another linksets, all is perfectly fitting > together... > > This trace is a clear demonstration of an existing bug in libss7, which > may be formulated as follows: "When we are terminating the call and sending > REL to the remote party, we must keep the record of the connection and > silently accept and absorb all MSUs, which may come back, until we receive > a RLC or T5 expires". > > What do you think about it ? > > With regards, > Pavel > >> Another possibility is you're mixing the whole thing in a single linkset >> where you must use two linksets in the way you explained. >> >> Can you see those errors with just a few test calls ? >> >> >> I found about 20 bugs / structural design flaws in stock libss7 / dahdi >> mtp2 support. With my changes the mtp2/mtp3 layers are far more robust >> than stock libss7. >> Fixed all but a single one, related to knowing then the linkset is up or >> down, and not trying to send isup messages, specially IAM through a down >> linkset - all sigchans down. >> >> If there's a bug, use ss7 set debug on linkset X to trace ss7 messages >> and track isup message flow. >> >> I used libss7 succesfully with telcobridges tmedia, digitro switches, >> ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and >> I'm probably missing a couple switch types. >> I never ran into SS7 / ISUP bugs of other switches, always libss7, but, >> the nature of the bugs found are nothing like what you're reporting. >> I started testing libss7 with those kinds of switches 5 years ago, so I >> have a some mileage to make those statements, specially from reading and >> understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code. >> >> The issue you're describing is caused by Asterisk getting ss7 messages >> that belong to another linkset or sending ss7 messages on the wrong ss7 >> link. >> Check for UCIC or CFN ISUP responses. >> >> >> >> you need to define chan_dahdi.conf basicly like this: >> >> ; basic ss7 / isup parameters, usually the same for the whole libss7 setup >> signalling=ss7 >> ss7type=itu/ansi >> ss7_called_nai=subscriber/national/international/unknown >> ss7_calling_nai=subscriber/national/international/unknown >> networkindicator=national/international/... >> >> ; Your local pointcode >> pointcode = X >> >> ; Start definition for linkset N >> linkset = N >> >> adjpointcode = STP point code otherwise switch point code >> ; Instantiate a signalling link on channel 16 belonging to linkset N, >> with adjacency to adjpointcode >> sigchan = 16 >> ; Define more signalling links if needed, with adjpointcode and sigchan >> >> defaultdpc = pointcode for ISUP messages >> cicbeginswith= CIC of the next voice channel defined >> ; Instantiate voice channel on linkset N, talking to PC defaultdpc, CIC >> numbering incremented automatically >> channel => dahdi channel range >> >> cicbeginswith= next CIC range, if non contiguous >> channel => dahdi channel range >> >> defaultdpc = another point code belonging to the same linkset (if links >> share signalling to multiple switches, typically links through an STP) >> ;repeat cicbeginswith, channel >> >> ; Starts definition of another linkset >> linkset = M >> ; repeat same sequence as above >> >> >> On 06/25/13 05:13, Pavel Troller wrote: >>> Hello Marcelo, >>> >>>> Per usual, read the fine manual. Wait, there's no manual ! >>> You're right :-). >>> >>>> Since you seem to have done your part and actually knows some ss7 and >>>> isup, here comes a hint. >>>> >>>> You created two or more linksets where you must have a single one. >>>> libss7 don't have the ss7 routing feature. >>> It seems strange to me. Let's try to explain this in more detailed way. >>> There is 1 (one) Asterisk box. >>> It has 2 (two) "linksets" configured, with 1 (one) signallink link per linkset. >>> Linkset 1 is configured for one DPC and with CICs 1 - 496. >>> Linkset 2 is configured for another (different) DPC and also with CICs 1 - 496. >>> Both the systems connected to this Asterisk box are configured to respond >>> directly to the linkset between them and the Asterisk, so it's sure that >>> a MSU from DPC1 cannot come over LS2 and vice versa. >>> I hope that this extremely simple setup is in the scope of current libss7 >>> functionality. Or am I wrong ? >>> >>>> In libss7 linkset concept is diferent from official ss7 linkset. >>>> >>>> All signalling links that carry ISUP traffic for a given set of channels >>>> must be kept on a single linkset, as well as all ISUP channels that go >>>> through those links. >>> I hope that my setup is conformant with this limitation. >>> >>>> It looks like you're getting incoming signalling for ISUP channels that >>>> are on another linkset. >>> It really looks like this, but I still hope it's not the case. Please note that >>> the traffic on the box is rather high, such an error occurs for one of, say, >>> 10000 call attempts. I think that in case of such a fatal routing problem, >>> which you are talking about, it wouldn't be possible to use the system >>> regularly. >>> >>>> I'm sure you didn't find any libss7 bug. >>> Really strong words! I wouldn't say it for any of my programs :-). >>> >>>> I have a highly customized version of libss7/dahdi/asterisk, fixing lots >>>> of issue, but this isn't one of them. >>> Possibly your setup/usage scenario is a bit different ? >>> >>> >>>> Processed over one million call setups, with a very complex setup (6 >>>> linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote >>>> switches using my simple STP solution, sharing the local links over SS7 >>>> over UDP - my simpler proprietary alternative to sigtran). >>> These switches (I have two of them, but the second one is still on a regular >>> unpatched SS7 stack) make approx. 3 millions of call setups per week. My >>> record (without restarting/crashing Asterisk) is about 3 weeks with more than >>> 10 millions of calls. >>> >>>> If you need commercial support, contact me off list. >>> Thanks for your offer. >>> >>> With regards, Pavel >>> >>>> On 06/24/13 09:02, Pavel Troller wrote: >>>>> Hi! >>>>> I would like to share my expiernce with deployment of this experimental SS7 >>>>> branch. >>>>> The first impressions are good, especially the timers seem to work well, >>>>> saving many calls from being frozen. >>>>> However, there are still some strange things, which I would like to discuss >>>>> here, one by one. >>>>> The first one is, that the channel sometimes doesn't recognize a message >>>>> (mostly RLC), even it comes from an action initiated by the channel itself. >>>>> Typically, the following is appearing often: >>>>> >>>>> [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097 >>>>> [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic >>>>> >>>>> As I understand, there were some timeouts and now the channel tries to >>>>> recover by sending RSC and firing T17. However, it seems that it immediately >>>>> rejects RLC, which comes back as a response to the RSC which was just sent >>>>> upon expiry of T17. And this appears again and again in the rhythm of T17, >>>>> and the channel is not operational. >>>>> ss7 show calls shows the following line for the misbehaving CIC: >>>>> 27 4097 11 IAM IAM >>>>> >>>>> Or, a very similar situation: >>>>> [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC >>>>> [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC >>>>> >>>>> The first question is, why there was no call while SUS was received. My >>>>> idea is, that both the parties hung up their phones in the same time and >>>>> that the call was undergoing destruction on Asterisk side (REL just sent >>>>> or something like this), while SUS arrived. Maybe the call was marked as >>>>> cleared even before RLC came back ? OK, I can understand this. But >>>>> if the CIC was reset as the first message says (i.e. RSC was sent), why the >>>>> RLC going back is not recognized then ? >>>>> >>>>> Or, just now the following appeared: >>>>> >>>>> [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic >>>>> [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic >>>>> >>>>> Again, it's questionable, why this happened, but the second line seems >>>>> to indicate some brokeness again. >>>>> >>>>> To explain: The channel is operating on a gateway equipped with 16 E1s >>>>> and current traffic is about 10 CAPS, there are two linksets to two >>>>> cooperating exchanges. They are EWSDs, which have very mature and stable >>>>> SS7, so I'm almost sure that they are not making signalling errors. >>>>> >>>>> With regards, >>>>> Pavel >>>>> >>>>> -- >>>>> _____________________________________________________________________ >>>>> -- Bandwidth and Colocation Provided by http://www.api-digital.com -- >>>>> >>>>> asterisk-ss7 mailing list >>>>> To UNSUBSCRIBE or update options visit: >>>>> http://lists.digium.com/mailman/listinfo/asterisk-ss7 >>>>> >> >> -- >> Atenciosamente, >> >> Marcelo Pacheco >> M2J Comunica??es e Inform?tica >> Fixo: (27)2222-8118 / (27)2233-2296 >> Vivo: (27)9964-5440 >> Claro: (27)9312-5319 >> MSN: marcelo at macp.eti.br >> E-mail: marcelo at m2j.com.br -- Atenciosamente, Marcelo Pacheco M2J Comunica??es e Inform?tica Fixo: (27)2222-8118 / (27)2233-2296 Vivo: (27)9964-5440 Claro: (27)9312-5319 MSN: marcelo at macp.eti.br E-mail: marcelo at m2j.com.br