Possible ICE check bug

Jonni Rainisto <jonni.rainisto@xxxxxxxxx> · Mon, 16 Jan 2017 08:40:33 +0000

Possible ICE check bug

Seems that ICE checking has an issue one of our VPN environment where mediarelay handles the call proxying in natted env.
Seems ICE check is failing because of timing issues in the code. 
Tested also in latest trunk and added some logging on called functions.

In successful case log shows:

09:09:01.021        icetp00  .Check 7: [2] 10.4.1.36:64134-->10.115.131.146:50391: state changed from In Progress to Succeeded
09:09:01.021        icetp00  .on_check_complete
09:09:01.021        icetp00  .Check 7 is successful
09:09:01.022        icetp00  .on_stun_send_msg
09:09:01.024        icetp00  pj_ice_sess_on_rx_pkt
09:09:01.024        icetp00  .on_stun_rx_request
09:09:01.024        icetp00  ..on_stun_send_msg
09:09:01.024        icetp00  .handle_incoming_check
09:09:01.024        icetp00  .update_comp_check
09:09:01.024        icetp00  .Valid check 0: [2] 10.115.131.146:50389-->10.115.131.146:50391 is nominated
09:09:01.024        icetp00  .Triggered check for check 7 not performed because it's completed
09:09:01.024        icetp00  ..on_check_complete
09:09:01.024        icetp00  ..Check 7 is successful  and nominated
09:09:01.025        icetp00  ..Cancelling check 2: [2] 10.4.1.36:64134-->10.115.5.10:38561 (In Progress)
09:09:01.025        icetp00  ..Check 2: [2] 10.4.1.36:64134-->10.115.5.10:38561: state changed from In Progress to Failed
09:09:01.025        icetp00  ..Cancelling check 3: [2] 10.4.1.36:64134-->100.114.251.142:38561 (In Progress)
09:09:01.025        icetp00  ..Check 3: [2] 10.4.1.36:64134-->100.114.251.142:38561: state changed from In Progress to Failed
09:09:01.025        icetp00  ..Cancelling check 6: [2] 10.4.1.36:64134-->10.115.131.146:50395 (In Progress)
09:09:01.025        icetp00  ..Check 6: [2] 10.4.1.36:64134-->10.115.131.146:50395: state changed from In Progress to Failed
09:09:01.026        icetp00  pj_ice_sess_on_rx_pkt
09:09:01.026        icetp00  .on_stun_rx_request
09:09:01.026        icetp00  ..on_stun_send_msg
09:09:01.027        icetp00  .handle_incoming_check
09:09:01.027        icetp00  .Triggered check for check 4 not performed because it's in progress. Retransmitting
09:09:01.027        icetp00  ...on_stun_send_msg
09:09:01.046        icetp00  pj_ice_sess_on_rx_pkt
09:09:01.046        icetp00  .on_stun_rx_request
09:09:01.046        icetp00  ..on_stun_send_msg
09:09:01.046        icetp00  .handle_incoming_check
09:09:01.046        icetp00  .Triggered check for check 4 not performed because it's in progress. Retransmitting
09:09:01.046        icetp00  ...on_stun_send_msg
09:09:01.101        icetp00  .on_stun_send_msg
09:09:01.105        icetp00  pj_ice_sess_on_rx_pkt
09:09:01.105        icetp00  .on_stun_request_complete
09:09:01.105        icetp00  .Check 4: [1] 10.4.1.36:64130-->10.115.131.146:50394 (nominated): connectivity check SUCCESS

So if comp_id=1(check 7) gets "is successful  and nominated" before next check is triggered, it triggers comp_id=2(Check 4 retransmits) and voice calls works nicely.
But if Triggered (handle_incoming_check) check happens before "is successful  and nominated" has happened, then it tries the old Check 7 and will never trigger on Check 4 which is in Progress state:

12:54:37.239        icetp00  ..on_stun_send_msg
12:54:37.239        icetp00  .handle_incoming_check
12:54:37.239        icetp00  .update_comp_check
12:54:37.239        icetp00  .Valid check 0: [2] 10.115.131.146:50453-->10.115.131.146:50455 is nominated
12:54:37.239        icetp00  .Triggered check for check 7 not performed because it's completed
12:54:37.239        icetp00  ..on_check_complete
12:54:37.239        icetp00  ..Check 7 is successful  and nominated
12:54:37.239        icetp00  ..Cancelling check 2: [2] 10.4.1.36:62952-->10.115.5.10:46907 (In Progress)
12:54:37.239        icetp00  ..Check 2: [2] 10.4.1.36:62952-->10.115.5.10:46907: state changed from In Progress to Failed
12:54:37.239        icetp00  ..Cancelling check 3: [2] 10.4.1.36:62952-->100.114.251.142:46907 (In Progress)
12:54:37.240        icetp00  ..Check 3: [2] 10.4.1.36:62952-->100.114.251.142:46907: state changed from In Progress to Failed
12:54:37.240        icetp00  ..Cancelling check 6: [2] 10.4.1.36:62952-->10.115.131.146:50459 (In Progress)
12:54:37.240        icetp00  ..Check 6: [2] 10.4.1.36:62952-->10.115.131.146:50459: state changed from In Progress to Failed
12:54:37.240        icetp00  pj_ice_sess_on_rx_pkt
12:54:37.241        icetp00  pj_ice_sess_on_rx_pkt
12:54:37.241        icetp00  .on_stun_rx_request
12:54:37.241        icetp00  ..on_stun_send_msg
12:54:37.241        icetp00  .handle_incoming_check
12:54:37.241        icetp00  .update_comp_check
12:54:37.242        icetp00  .Valid check 0: [2] 10.115.131.146:50453-->10.115.131.146:50455 is nominated
12:54:37.242        icetp00  .Triggered check for check 7 not performed because it's completed
12:54:37.242        icetp00  ..on_check_complete
12:54:37.242        icetp00  ..Check 7 is successful  and nominated
12:54:37.244        icetp00  .on_stun_send_msg

Any ideas where to continue debugging and/or how to fix it? Seems that 20ms tiggering seems to stops happening when complete check on 1st comp_id takes too much time.

Br, Jonni
_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org