On 23-1-2017 14:12, kefu chai wrote: > On Mon, Jan 23, 2017 at 1:32 PM, Ming Lin <minggr@xxxxxxxxx> wrote: >> Hi, >> >> [mon] >> mon initial members = a >> >> [mon.a] >> mon addr = 192.168.0.1:7000 >> >> jewel release with 1 monitor and 3 OSDs. >> >> After hours of fio rbd test, the monitor binded port(7000) was closed >> for unknown reason. > > is this reproduciable? can you enable ms-debug=1 and attach the log? > >> But monitor is still running, >> >> # pidof ceph-mon >> 21980 >> >> Another strange thing is there are more than 1000 threads for the >> ceph-mon process. >> >> # ls /proc/21980/task/ |wc -l >> 1022 >> >> Any possible reason? > > are you using the simple messenger? maybe you can use pstree to check > the thread names? Other nice tools to help you here to see where the socket has gone: sockstat lsof They can tell you if the socket is somewhere else opened. And what the state is. Or they can tell you what other port(s) the monitor has opened. Also note that recycling thru large numbers of ports does not always goes well with the TCPstack. That is why there is the options: SO_REUSEADDR enables local address reuse SO_REUSEPORT enables duplicate address and port bindings But then still you might run out of ports if too many are left in a WAIT state. --WjW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html