Hi Hugo,
Instead of 'pj_stun_session', it is perhaps the 'pj_stun_tx_data' that had been destroyed when 'on_cache_timeout' was invoked, and this scenario seems to be possible indeed. So just created a patch (attached), unfortunately I could not easily reproduce the issue here, so if the issue is reproducible there, please help us testing the patch, thank you in advance.
BR,
nanang
On Fri, Jul 5, 2019 at 7:10 PM Hugo Sobral <hugo.sobral@xxxxxxxxx> wrote:
Hi Nanang,It was in this line:
PJ_LOG(5,(SNAME(tdata->sess), "Response cache deleted"));
Exception thrown: read access violation.
tdata->sess was 0xFFFFFFFFFFFFFFF7. occurred
pj_wrapper.dll!on_cache_timeout(pj_timer_heap_t * timer_heap, pj_timer_entry * entry) Line 226
at pjnath\src\pjnath\stun_session.c(226)
pj_wrapper.dll!pj_timer_heap_poll(pj_timer_heap_t * ht, pj_time_val * next_delay) Line 651
at pjlib\src\pj\timer.c(651)
[Managed to Native Transition]
From debugger, current values:tdata 0x000001c5f264f828 {prev=0xdddddddddddddddd {prev=??? next=??? pool=??? ...} next=0xdddddddddddddddd {...} ...} pj_stun_tx_data *
prev 0xdddddddddddddddd {prev=??? next=??? pool=??? ...} pj_stun_tx_data *
next 0xdddddddddddddddd {prev=??? next=??? pool=??? ...} pj_stun_tx_data *
pool 0xdddddddddddddddd {prev=??? next=??? obj_name=0xdddddddddddddded <Error reading characters of string.> ...} pj_pool_t *
sess 0xdddddddddddddddd {cfg=??? pool=??? grp_lock=??? ...} pj_stun_session *
msg 0xdddddddddddddddd {hdr={type=??? length=??? magic=??? ...} attr_count=??? attr=0xddddddddddddddf5 {...} } pj_stun_msg *
token 0xdddddddddddddddd void *
client_tsx 0xdddddddddddddddd {obj_name=0xdddddddddddddddd <Error reading characters of string.> cb={_on_complete_=...} ...} pj_stun_client_tsx *
retransmit -572662307 int
msg_magic 3722304989 unsigned int
msg_key 0x000001c5f264f868 "ÝÝÝÝÝÝÝÝÝÝÝÝ... unsigned char[12]
auth_info {realm={ptr=0xdddddddddddddddd <Error reading characters of string.> slen=-2459565876494606883 } username=...} pj_stun_req_cred_info
pkt 0xdddddddddddddddd void *
max_len 3722304989 unsigned int
pkt_size 15987178197214944733 unsigned __int64
addr_len 3722304989 unsigned int
dst_addr 0xdddddddddddddddd const void *
res_timer {user_data=0xdddddddddddddddd id=-572662307 cb=0xdddddddddddddddd ...} pj_timer_entry
tdata->sess 0xdddddddddddddddd {cfg=??? pool=??? grp_lock=??? ...} pj_stun_session *
cfg <Unable to read memory>
pool <Unable to read memory>
grp_lock <Unable to read memory>
cb {_on_send_msg_=??? _on_rx_request_=??? _on_request_complete_=??? ...} pj_stun_session_cb
user_data <Unable to read memory>
is_destroying <Unable to read memory>
use_fingerprint <Unable to read memory>
rx_pool <Unable to read memory>
dump_buf 0xddddddddddddde2d <Error reading characters of string.> char[1000]
log_flag <Unable to read memory>
auth_type <Unable to read memory>
cred {type=??? data="" slen=??? } username={ptr=??? slen=??? } data_type=??? ...} ...} } pj_stun_auth_cred
auth_retry <Unable to read memory>
next_nonce {ptr=??? slen=??? } pj_str_t
server_realm {ptr=??? slen=??? } pj_str_t
srv_name {ptr=??? slen=??? } pj_str_t
pending_request_list {prev=??? next=??? pool=??? ...} pj_stun_tx_data
cached_response_list {prev=??? next=??? pool=??? ...} pj_stun_tx_dataBest Regards,Hugo Sobral_______________________________________________On Wed, 3 Jul 2019 at 11:04, Nanang Izzuddin <nanang@xxxxxxxxx> wrote:Hi Hugo,The STUN tdata timer in STUN session (i.e: tdata->res_timer) is scheduled using 'pj_stun_session' group lock, so normally it should be safe from race condition between 'pj_stun_session' destroy and 'on_cache_timeout', but perhaps there is a corner case, could you also send the call stack traces when the crash occurs?BR,nanang_______________________________________________On Sat, Jun 29, 2019 at 1:30 AM Hugo Sobral <hugo.sobral@xxxxxxxxx> wrote:Hi,_______________________________________________Sometimes I get a crash in "pj_timer_heap_poll"I notice that there was some corrections about that in milestone 2.9But these corrections does not solve my issue.After some time debugging the issue, it looks like that it happens when the code:-------------------------------if (node->cb)
(*node->cb)(ht, node);-------------------------------in function 'pj_timer_heap_poll' is invoked with the function 'on_cache_timeout' at the same time that 'pj_stun_session' is being destroyed (this destroy is made by application's code because the call terminates).I cannot understand exactly how this happens,but I think that there is a time gap in function 'pj_timer_heap_poll' after the 'unlock_timer_heap(ht)' that 'pj_stun_session' can be destroyed and crash inside 'on_cache_timeout'Is this a known issue in the specified functions?Best regards,Hugo Sobral
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Attachment:
stun_tdata_destroy_n_timer_race.diff
Description: Binary data
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org