On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote: > On 3/19/25 23:18, Cong Wang wrote: > > On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote: > >> Signal delivery during connect() may lead to a disconnect of an already > >> established socket. That involves removing socket from any sockmap and > >> resetting state to SS_UNCONNECTED. While it correctly restores socket's > >> proto, a call to vsock_bpf_recvmsg() might have been already under way in > >> another thread. If the connect()ing thread reassigns the vsock transport to > >> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE. > >> > > *THREAD 1* *THREAD 2* > > >> connect > >> / state = SS_CONNECTED / > >> sock_map_update_elem > >> vsock_bpf_recvmsg > >> psock = sk_psock_get() > >> lock sk > >> if signal_pending > >> unhash > >> sock_map_remove_links > > > > So vsock's ->recvmsg() should be restored after this, right? Then how is > > vsock_bpf_recvmsg() called afterward? > > I'm not sure I understand the question, so I've added a header above: those > are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called > afterwards. It was called before sock_map_remove_links(). Note that at the > time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still > executing (in T2). I thought the above vsock_bpf_recvmsg() on the right side completed before sock_map_remove_links(), sorry for the confusion. > > >> state = SS_UNCONNECTED > >> release sk > >> > >> connect > >> transport = NULL > >> lock sk > >> WARN_ON_ONCE(!vsk->transport) > >> > > > > And I am wondering why we need to WARN here since we can handle this error > > case correctly? > > The WARN and transport check are here for defensive measures, and to state > a contract. > > But I think I get your point. If we accept for a fact of life that BPF code > should be able to handle transport disappearing - then WARN can be removed > (while keeping the check) and this patch can be dropped. I am thinking whether we have more elegant way to handle this case, WARN looks not pretty. > > My aim, instead, was to keep things consistent. By which I mean sticking to > the conditions expressed in vsock_bpf_update_proto() as invariants; so that > vsock with a psock is guaranteed to have transport assigned. Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg() because for example UDP is (almost) lockless, so enforcing the sock lock for all vsock types looks not flexible and may hurt performance. Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks for different struct proto (as we did for TCP/UDP)? Thanks.