Hi Matthew, I found a workaround for my (our) problem: in the librdmacm code, rsocket.c, there is a global constant polling_time, which is set to 10 microseconds at the moment. I raise this to 10000 - and all of a sudden things work nicely. I think we are looking at two issues here: 1. the thread structure of ceph messenger For a given socket connection, there are 3 threads of interest here: the main messenger thread, the Pipe::reader and the Pipe::writer. For a ceph client like the ceph admin command, I see the following sequence - the connection to the ceph monitor is created by the main messenger thread, the Pipe::reader and Pipe::writer are instantiated. - the requested command is sent to the ceph monitor, the answer is read and printed - at this point the Pipe::reader already has called tcp_read_wait(), polling for more data or connection termination - after the response had been printed, the main loop calls the shutdown routines which in in turn shutdown() the socket There is some time between the last two steps - and this gap is long enough to open a race: 2. rpoll, ibv and poll the rpoll implementation in rsockets is split in 2 phases: - a busy loop which checks the state of the underlying ibv queue pair - the call to real poll() system call (i.e. the uverbs(?) implementation of poll() inside the kernel) The busy loop has a maximum duration of polling_time (10 microseconds by default) - and is able detect the local shutdown and returns a POLLHUP. The poll() system call (i.e. the uverbs implementation of poll() in the kernel) does not detect the local shutdown - and only returns after the caller supplied timeout expires. Increasing the rsockets polloing_time from 10 to 10000 microseconds results in the rpoll to detect the local shutdown within the busy loop. Decreasing the ceph "ms tcp read timeout" from the default of 900 to 5 seconds serves a similar purpose, but is much coarser.