On Thu, Dec 19, 2024 at 05:09:42PM +0100, Michal Luczaj wrote:
On 12/19/24 16:12, Stefano Garzarella wrote:
On Thu, 19 Dec 2024 at 16:05, Michal Luczaj <mhal@xxxxxxx> wrote:
On 12/19/24 15:48, Stefano Garzarella wrote:
On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@xxxxxxx> wrote:
On 12/19/24 09:19, Stefano Garzarella wrote:
...
I think the best thing though is to better understand how to handle
deassign, rather than checking everywhere that it's not null, also
because in some cases (like the one in virtio-vsock), it's also
important that the transport is the same.
My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
it impossible-by-design to switch ->transport from non-NULL to NULL in
vsock_assign_transport().
I don't know if that's enough, in this case the problem is that some
response packets are intended for a socket, where the transport has
changed. So whether it's null or assigned but different, it's still a
problem we have to handle.
So making it impossible for the transport to be null, but allowing it
to be different (we can't prevent it from changing), doesn't solve the
problem for us, it only shifts it.
Got it. I assumed this issue would be solved by `vsk->transport !=
&t->transport` in the critical place(s).
(Note that BPF doesn't care if transport has changed; BPF just expects to
have _a_ transport.)
If I'm not mistaken, that would require rewriting vsock_assign_transport()
so that a new transport is assigned only once fully initialized, otherwise
keep the old one (still unhurt and functional) and return error. Because
failing connect() should not change anything under the hood, right?
Nope, connect should be able to change the transport.
Because a user can do an initial connect() that requires a specific
transport, this one fails maybe because there's no peer with that cid.
Then the user can redo the connect() to a different cid that requires
a different transport.
But the initial connect() failing does not change anything under the hood
(transport should/could stay NULL).
Nope, isn't null, it's assigned to a transport, because for example it
has to send a packet to connect to the remote CID and wait back for a
response that for example says the CID doesn't exist.
Ahh, I think I get it. So the initial connect() passed
vsock_assign_transport() successfully and then failed deeper in
vsock_connect(), right? That's fine. Let the socket have a useless
transport (a valid pointer nevertheless).
Just to be clear, it's not useless, since it's used to make the
connection. We know that it's useless just when we report that the
connection failed, so maybe we should de-assign it when we set
`sock->state = SS_UNCONNECTED`.
Sure, upcoming connect() can
assign a new (possibly useless just as well) transport, but there's no
reason to allow ->transport becoming NULL.
I'm not sure about this, in the end in the connection failure case, when
we set `sock->state = SS_UNCONNECTED`, we're returning the socket to a
pre-connect state, so it might make sense to also reset the transport to
NULL, so that we have exactly the same conditions.
And a pre-connect socket (where ->transport==NULL) is not an issue, because
BPF won't let it in any sockmap, so vsock_bpf_recvmsg() won't be reachable.
Anywa, thanks for explaining,
Michal
PS. Or ignore the above and remove the socket from the sockmap at every
reconnect? Possible unhash abuse:
I should take a closer look at unhash, but it might make sense!
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5cf8109f672a..8a65153ee186 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -483,6 +483,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
if (vsk->transport == new_transport)
return 0;
+ const struct proto *prot = READ_ONCE(sk->sk_prot);
+ if (prot->unhash)
+ prot->unhash(sk);
+
/* transport->release() must be called with sock lock acquired.
* This path can only be taken during vsock_connect(), where we
* have already held the sock lock. In the other cases, this
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 4aa6e74ec295..80deb4d70aea 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -119,6 +119,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas
*prot = *base;
prot->close = sock_map_close;
prot->recvmsg = vsock_bpf_recvmsg;
+ prot->unhash = sock_map_unhash;
prot->sock_is_readable = sk_msg_is_readable;
}
Then a successful re-connect assigns
the transport (NULL -> non-NULL). And it's all good because all I wanted to
avoid (because of BPF) was non-NULL -> NULL. Anyway, that's my possibly
shallow understanding :)
Note that non-NULL -> NULL should only occur before a connection is
established, so before any data is passed. Is this a problem for BPF?
Thanks,
Stefano