On 16/12/2024 12:09, Sabrina Dubroca wrote:
[...]
Maybe we should call cancel_sync_work(&ovpn_sock->work) inside
ovpn_socket_get()?
So the latter will return NULL only when it is sure that the socket has been
detached.
At that point we can skip the following return and continue along the "new
socket" path.
What do you think?
The work may not have been scheduled yet? (small window between the
last kref_put and schedule_work)
Maybe a completion [Documentation/scheduler/completion.rst] would
solve it (but it makes things even more complex, unfortunately):
- at the end of ovpn_socket_detach: complete(&ovpn_sock->detached);
- in ovpn_socket_new when handling EALREADY: wait_for_completion(&ovpn_sock->detached);
- in ovpn_socket_new for the new socket: init_completion(&ovpn_sock->detached);
but ovpn_sock could be gone immediately after complete(). Maybe
something with completion_done() before the kfree_rcu in
ovpn_socket_detach? I'm not that familiar with the completion API.
It seems the solution we are aiming for is more complex than the concept
of ovpn_socket per se :-D
I'll think a bit more about this..maybe we can avoid entering this
situation at all..
However, this makes we wonder: what happens if we have two racing PEER_NEW
with the same non-yet-attached UDP socket?
mhmm, I remember noticing that, but it seems I never mentioned it in
my reviews. Sorry.
Maybe we should lock the socket in ovpn_udp_socket_attach() when checking
its user-data and setting it (in order to make the test-and-set atomic)?
I'd use the lock to protect all of ovpn_socket_new.
ovpn_tcp_socket_attach locks the socket but after doing the initial
checks, so 2 callers could both see sock->sk->sk_user_data == NULL and
do the full attach. And I don't think unlocking before
rcu_assign_sk_user_data is safe for either UDP or TCP.
I tend to agree here. Guarding the whole ovpn_socket_new with
lock_sock() seems the right thing to do.
I am specifically talking about this in udp.c:
345 /* make sure no pre-existing encapsulation handler exists */
346 rcu_read_lock();
347 old_data = rcu_dereference_sk_user_data(sock->sk);
348 if (!old_data) {
349 /* socket is currently unused - we can take it */
350 rcu_read_unlock();
351 setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
352 return 0;
353 }
We will end up returning 0 in both contexts and thus allocate two
ovpn_sockets instead of re-using the first one we allocated.
Does it make sense?
Yes.
[...]
[I have some more nits/typos here and there but I worry the
maintainers will get "slightly" annoyed if I make you repost 22
patches once again :) -- if that's all I find in the next few days,
everyone might be happier if I stash them and we get them fixed after
merging?]
If we have to rework this socket attaching part, it may be worth throwing in
those typ0 fixes too :)
ACK, I'll send them out.
Thanks.
Regards,
--
Antonio Quartulli
OpenVPN Inc.