On 12/19/24 16:12, Stefano Garzarella wrote: > On Thu, 19 Dec 2024 at 16:05, Michal Luczaj <mhal@xxxxxxx> wrote: >> >> On 12/19/24 15:48, Stefano Garzarella wrote: >>> On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@xxxxxxx> wrote: >>>> >>>> On 12/19/24 09:19, Stefano Garzarella wrote: >>>>> ... >>>>> I think the best thing though is to better understand how to handle >>>>> deassign, rather than checking everywhere that it's not null, also >>>>> because in some cases (like the one in virtio-vsock), it's also >>>>> important that the transport is the same. >>>> >>>> My vote would be to apply your virtio_transport_recv_pkt() patch *and* make >>>> it impossible-by-design to switch ->transport from non-NULL to NULL in >>>> vsock_assign_transport(). >>> >>> I don't know if that's enough, in this case the problem is that some >>> response packets are intended for a socket, where the transport has >>> changed. So whether it's null or assigned but different, it's still a >>> problem we have to handle. >>> >>> So making it impossible for the transport to be null, but allowing it >>> to be different (we can't prevent it from changing), doesn't solve the >>> problem for us, it only shifts it. >> >> Got it. I assumed this issue would be solved by `vsk->transport != >> &t->transport` in the critical place(s). >> >> (Note that BPF doesn't care if transport has changed; BPF just expects to >> have _a_ transport.) >> >>>> If I'm not mistaken, that would require rewriting vsock_assign_transport() >>>> so that a new transport is assigned only once fully initialized, otherwise >>>> keep the old one (still unhurt and functional) and return error. Because >>>> failing connect() should not change anything under the hood, right? >>>> >>> >>> Nope, connect should be able to change the transport. >>> >>> Because a user can do an initial connect() that requires a specific >>> transport, this one fails maybe because there's no peer with that cid. >>> Then the user can redo the connect() to a different cid that requires >>> a different transport. >> >> But the initial connect() failing does not change anything under the hood >> (transport should/could stay NULL). > > Nope, isn't null, it's assigned to a transport, because for example it > has to send a packet to connect to the remote CID and wait back for a > response that for example says the CID doesn't exist. Ahh, I think I get it. So the initial connect() passed vsock_assign_transport() successfully and then failed deeper in vsock_connect(), right? That's fine. Let the socket have a useless transport (a valid pointer nevertheless). Sure, upcoming connect() can assign a new (possibly useless just as well) transport, but there's no reason to allow ->transport becoming NULL. And a pre-connect socket (where ->transport==NULL) is not an issue, because BPF won't let it in any sockmap, so vsock_bpf_recvmsg() won't be reachable. Anywa, thanks for explaining, Michal PS. Or ignore the above and remove the socket from the sockmap at every reconnect? Possible unhash abuse: diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 5cf8109f672a..8a65153ee186 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -483,6 +483,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) if (vsk->transport == new_transport) return 0; + const struct proto *prot = READ_ONCE(sk->sk_prot); + if (prot->unhash) + prot->unhash(sk); + /* transport->release() must be called with sock lock acquired. * This path can only be taken during vsock_connect(), where we * have already held the sock lock. In the other cases, this diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index 4aa6e74ec295..80deb4d70aea 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -119,6 +119,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas *prot = *base; prot->close = sock_map_close; prot->recvmsg = vsock_bpf_recvmsg; + prot->unhash = sock_map_unhash; prot->sock_is_readable = sk_msg_is_readable; } >> Then a successful re-connect assigns >> the transport (NULL -> non-NULL). And it's all good because all I wanted to >> avoid (because of BPF) was non-NULL -> NULL. Anyway, that's my possibly >> shallow understanding :)