> On Aug 28, 2023, at 2:35 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2023-08-28 at 18:02 +0000, Chuck Lever III wrote: >> Hi - >> >> Anna has identified a window during a handshake where a socket >> close will trigger a crash. This could be a common scenario for >> a rejected handshake. >> >> <idle>-0 [003] 5405.466661: rpc_socket_state_change: >> socket:[59266] srcaddr=192.168.122.166:833 >> dstaddr=192.168.122.100:2049 state=2 (CONNECTING) sk_state=8 >> (CLOSE_WAIT) >> <idle>-0 [003] 5405.466665: xs_data_ready: >> peer=[192.168.122.100]:2049 >> <idle>-0 [003] 5405.466668: rpc_socket_state_change: >> socket:[59266] srcaddr=192.168.122.166:833 >> dstaddr=192.168.122.100:2049 state=2 (CONNECTING) sk_state=7 (CLOSE) >> <idle>-0 [003] 5405.466669: rpc_socket_error: >> error=-32 socket:[59266] srcaddr=192.168.122.166:833 >> dstaddr=192.168.122.100:2049 state=2 (CONNECTING) sk_state=7 (CLOSE) >> kworker/u8:2-2367 [001] 5405.466786: xprt_disconnect_force: >> peer=[192.168.122.100]:2049 state=BOUND >> tlshd-6005 [002] 5405.466786: handshake_cmd_done: >> req=0xffff9e8a43c7d000 sk=0xffff9e8a42e2f1c0 fd=6 >> tlshd-6005 [002] 5405.466786: handshake_complete: >> req=0xffff9e8a43c7d000 sk=0xffff9e8a42e2f1c0 status=13 >> kworker/u8:2-2367 [001] 5405.466789: xprt_disconnect_auto: >> peer=[192.168.122.100]:2049 state=LOCKED|CLOSE_WAIT|BOUND >> kworker/u8:2-2367 [001] 5405.466790: function: >> xs_reset_transport >> kworker/u8:2-2367 [001] 5405.466794: kernel_stack: >> <stack trace > >> => ftrace_trampoline (ffffffffc0c3409b) >> => xs_reset_transport (ffffffffc0814225) >> => xs_tcp_shutdown (ffffffffc0816b3e) >> => xprt_autoclose (ffffffffc0811799) >> => process_one_work (ffffffffa6ae2777) >> => worker_thread (ffffffffa6ae2d67) >> => kthread (ffffffffa6ae93e7) >> => ret_from_fork (ffffffffa6a424f7) >> => ret_from_fork_asm (ffffffffa6a03aeb) >> kworker/u8:2-2367 [001] 5405.466795: handshake_cancel_busy: >> req=0xffff9e8a43c7d000 sk=0xffff9e8a42e2f1c0 >> kworker/u8:2-2367 [001] 5405.466795: rpc_socket_state_change: >> socket:[59266] srcaddr=192.168.122.166:833 >> dstaddr=192.168.122.100:2049 state=4 (DISCONNECTING) sk_state=7 >> (CLOSE) >> kworker/u8:2-2367 [001] 5405.466797: rpc_socket_close: >> socket:[59266] srcaddr=192.168.122.166:833 >> dstaddr=192.168.122.100:2049 state=4 (DISCONNECTING) sk_state=7 >> (CLOSE) >> kworker/u8:2-2367 [001] 5405.466797: xprt_disconnect_done: >> peer=[192.168.122.100]:2049 state=LOCKED|BOUND >> kworker/u8:2-2367 [001] 5405.466798: xprt_release_xprt: >> task:ffffffff@ffffffff snd_task:ffffffff >> kworker/u8:7-2407 [000] 5405.466804: bprint: >> xs_tcp_tls_finish_connecting: xs_tcp_tls_finish_connecting(handshake >> status=0): sock is NULL!! >> >> Do you have any suggestions about how to safely prevent ->close >> from firing on the lower transport during a TLS handshake? >> > > Shouldn't something be holding a reference to the upper transport > across that call to setup the lower transport connection? I didn't see other transports holding an extra reference. But it can be added. However, I don't understand how that would prevent the socket from being closed and released while the lower transport still owns it, which is what is happening here. -- Chuck Lever