Thanks for your reply.
On 2022/1/3 6:36 pm, Karsten Graul wrote:
On 31/12/2021 10:44, Wen Gu wrote:
On 2021/12/29 8:56 pm, Karsten Graul wrote:
On 28/12/2021 16:13, Wen Gu wrote:
We encountered some crashes caused by the race between the access
and the termination of link groups.
What do you think about it?
Hi Wen,
thank you, and I also wish you and your family a happy New Year!
Thanks for your detailed explanation, you convinced me of your idea to use
a reference counting! I think its a good solution for the various problems you describe.
I am still thinking that even if you saw no problems when conn->lgr is not NULL when the lgr
is already terminated there should be more attention on the places where conn->lgr is checked.
Thank you for reminding. I agree with the concern.
It should be improved to avoid the potential issue we haven't found.
For example, in smc_cdc_get_slot_and_msg_send() there is a check for !conn->lgr with the intention
to avoid working with a terminated link group.
Should all checks for !conn->lgr be now replaced by the check for conn->freed ?? Does this make sense?
In my humble opinion, we can replace !conn->lgr with !conn->alert_token_local.
If a smc connection is registered to a link group successfully by smc_lgr_register_conn(),
conn->alert_token_local is set to non-zero. At this moment, the conn->lgr is ready to be used.
And if the link group is terminated, conn->alert_token_local is reset to zero in smc_lgr_unregister_conn(),
meaning that the link group registered to connection shouldn't be used anymore.
So I think checking conn->alert_token_local has the same effect with checking conn->lgr to
identify whether the link group pointed by conn->lgr is still healthy and able to be used.
What do you think about it? :)
Thanks,
Wen Gu