On 16/12/2024 12:50, Antonio Quartulli wrote:
On 16/12/2024 12:09, Sabrina Dubroca wrote:
[...]
Maybe we should call cancel_sync_work(&ovpn_sock->work) inside
ovpn_socket_get()?
So the latter will return NULL only when it is sure that the socket
has been
detached.
At that point we can skip the following return and continue along the
"new
socket" path.
What do you think?
The work may not have been scheduled yet? (small window between the
last kref_put and schedule_work)
Maybe a completion [Documentation/scheduler/completion.rst] would
solve it (but it makes things even more complex, unfortunately):
- at the end of ovpn_socket_detach: complete(&ovpn_sock->detached);
- in ovpn_socket_new when handling EALREADY:
wait_for_completion(&ovpn_sock->detached);
- in ovpn_socket_new for the new socket: init_completion(&ovpn_sock-
>detached);
but ovpn_sock could be gone immediately after complete(). Maybe
something with completion_done() before the kfree_rcu in
ovpn_socket_detach? I'm not that familiar with the completion API.
It seems the solution we are aiming for is more complex than the concept
of ovpn_socket per se :-D
I'll think a bit more about this..maybe we can avoid entering this
situation at all..
I see that there are some kref_put variants that acquire a lock just
before hitting zero and running the release cb.
If I implement a kref_put variant that acquires the lock_sock, I could
then perform the udp detach under lock, thus ensuring that zero'ing the
refcount and erasing the sk_user_data happens while holding the lock_sock.
This way I should be able to prevent the situation where "sk_user_data
still says EALREADY, but the refcnt is actually 0".
I hope adding this new API is fine.
I am giving it a try now.
Regards,
--
Antonio Quartulli
OpenVPN Inc.