Possible ICE check bug
Seems that ICE checking has an issue one of our VPN environment where mediarelay handles the call proxying in natted env.
Seems ICE check is failing because of timing issues in the code.
Tested also in latest trunk and added some logging on called functions.
In successful case log shows:
09:09:01.021 icetp00 .Check 7: [2] 10.4.1.36:64134-->10.115.131.146:50391: state changed from In Progress to Succeeded
09:09:01.021 icetp00 .on_check_complete
09:09:01.021 icetp00 .Check 7 is successful
09:09:01.022 icetp00 .on_stun_send_msg
09:09:01.024 icetp00 pj_ice_sess_on_rx_pkt
09:09:01.024 icetp00 .on_stun_rx_request
09:09:01.024 icetp00 ..on_stun_send_msg
09:09:01.024 icetp00 .handle_incoming_check
09:09:01.024 icetp00 .update_comp_check
09:09:01.024 icetp00 .Valid check 0: [2] 10.115.131.146:50389-->10.115.131.146:50391 is nominated
09:09:01.024 icetp00 .Triggered check for check 7 not performed because it's completed
09:09:01.024 icetp00 ..on_check_complete
09:09:01.024 icetp00 ..Check 7 is successful and nominated
09:09:01.025 icetp00 ..Cancelling check 2: [2] 10.4.1.36:64134-->10.115.5.10:38561 (In Progress)
09:09:01.025 icetp00 ..Check 2: [2] 10.4.1.36:64134-->10.115.5.10:38561: state changed from In Progress to Failed
09:09:01.025 icetp00 ..Cancelling check 3: [2] 10.4.1.36:64134-->100.114.251.142:38561 (In Progress)
09:09:01.025 icetp00 ..Check 3: [2] 10.4.1.36:64134-->100.114.251.142:38561: state changed from In Progress to Failed
09:09:01.025 icetp00 ..Cancelling check 6: [2] 10.4.1.36:64134-->10.115.131.146:50395 (In Progress)
09:09:01.025 icetp00 ..Check 6: [2] 10.4.1.36:64134-->10.115.131.146:50395: state changed from In Progress to Failed
09:09:01.026 icetp00 pj_ice_sess_on_rx_pkt
09:09:01.026 icetp00 .on_stun_rx_request
09:09:01.026 icetp00 ..on_stun_send_msg
09:09:01.027 icetp00 .handle_incoming_check
09:09:01.027 icetp00 .Triggered check for check 4 not performed because it's in progress. Retransmitting
09:09:01.027 icetp00 ...on_stun_send_msg
09:09:01.046 icetp00 pj_ice_sess_on_rx_pkt
09:09:01.046 icetp00 .on_stun_rx_request
09:09:01.046 icetp00 ..on_stun_send_msg
09:09:01.046 icetp00 .handle_incoming_check
09:09:01.046 icetp00 .Triggered check for check 4 not performed because it's in progress. Retransmitting
09:09:01.046 icetp00 ...on_stun_send_msg
09:09:01.101 icetp00 .on_stun_send_msg
09:09:01.105 icetp00 pj_ice_sess_on_rx_pkt
09:09:01.105 icetp00 .on_stun_request_complete
09:09:01.105 icetp00 .Check 4: [1] 10.4.1.36:64130-->10.115.131.146:50394 (nominated): connectivity check SUCCESS
So if comp_id=1(check 7) gets "is successful and nominated" before next check is triggered, it triggers comp_id=2(Check 4 retransmits) and voice calls works nicely.
But if Triggered (handle_incoming_check) check happens before "is successful and nominated" has happened, then it tries the old Check 7 and will never trigger on Check 4 which is in Progress state:
12:54:37.239 icetp00 ..on_stun_send_msg
12:54:37.239 icetp00 .handle_incoming_check
12:54:37.239 icetp00 .update_comp_check
12:54:37.239 icetp00 .Valid check 0: [2] 10.115.131.146:50453-->10.115.131.146:50455 is nominated
12:54:37.239 icetp00 .Triggered check for check 7 not performed because it's completed
12:54:37.239 icetp00 ..on_check_complete
12:54:37.239 icetp00 ..Check 7 is successful and nominated
12:54:37.239 icetp00 ..Cancelling check 2: [2] 10.4.1.36:62952-->10.115.5.10:46907 (In Progress)
12:54:37.239 icetp00 ..Check 2: [2] 10.4.1.36:62952-->10.115.5.10:46907: state changed from In Progress to Failed
12:54:37.239 icetp00 ..Cancelling check 3: [2] 10.4.1.36:62952-->100.114.251.142:46907 (In Progress)
12:54:37.240 icetp00 ..Check 3: [2] 10.4.1.36:62952-->100.114.251.142:46907: state changed from In Progress to Failed
12:54:37.240 icetp00 ..Cancelling check 6: [2] 10.4.1.36:62952-->10.115.131.146:50459 (In Progress)
12:54:37.240 icetp00 ..Check 6: [2] 10.4.1.36:62952-->10.115.131.146:50459: state changed from In Progress to Failed
12:54:37.240 icetp00 pj_ice_sess_on_rx_pkt
12:54:37.241 icetp00 pj_ice_sess_on_rx_pkt
12:54:37.241 icetp00 .on_stun_rx_request
12:54:37.241 icetp00 ..on_stun_send_msg
12:54:37.241 icetp00 .handle_incoming_check
12:54:37.241 icetp00 .update_comp_check
12:54:37.242 icetp00 .Valid check 0: [2] 10.115.131.146:50453-->10.115.131.146:50455 is nominated
12:54:37.242 icetp00 .Triggered check for check 7 not performed because it's completed
12:54:37.242 icetp00 ..on_check_complete
12:54:37.242 icetp00 ..Check 7 is successful and nominated
12:54:37.244 icetp00 .on_stun_send_msg
Any ideas where to continue debugging and/or how to fix it? Seems that 20ms tiggering seems to stops happening when complete check on 1st comp_id takes too much time.
Br, Jonni
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org