Sorry for my mistake to send an empty email before, here is the content: Hi, Haomai I want to test async messenger with rdma, but when i ceph-deploy a cluster with the following conf, it failed. ##ceph.conf## [global] fsid = d841b987-c6d6-4267-923a-2ad3e30a6e9b mon_initial_members = ceph21 mon_host = 192.168.1.2 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx ms_type = async ms_async_transport_type = rdma ms_async_rdma_device_name = mlx4_0 ms_async_rdma_port_num = 1 osd_crush_chooseleaf_type = 0 osd_pool_default_size = 1 rbd_default_features = 1 ##here is ceph-deploy failed info## cepher@ceph21:~/ceph_deploy$ ceph-deploy mon create-initial [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cepher/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.35): /usr/bin/ceph-deploy mon create-initial [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] overwrite_conf : False [ceph_deploy.cli][INFO ] subcommand : create-initial [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe9ab01bbd8> [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] func : <function mon at 0x7fe9ab490140> [ceph_deploy.cli][INFO ] ceph_conf : None [ceph_deploy.cli][INFO ] keyrings : None [ceph_deploy.cli][INFO ] default_release : False [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph21 [ceph_deploy.mon][DEBUG ] detecting platform for host ceph21 ... [ceph21][DEBUG ] connection detected need for sudo [ceph21][DEBUG ] connected to host: ceph21 [ceph21][DEBUG ] detect platform information from remote host [ceph21][DEBUG ] detect machine type [ceph21][DEBUG ] find the location of an executable [ceph_deploy.mon][INFO ] distro info: debian 8.5 jessie [ceph21][DEBUG ] determining if provided host has same hostname in remote [ceph21][DEBUG ] get remote short hostname [ceph21][DEBUG ] deploying mon to ceph21 [ceph21][DEBUG ] get remote short hostname [ceph21][DEBUG ] remote hostname: ceph21 [ceph21][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph21][DEBUG ] create the mon path if it does not exist [ceph21][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph21/done [ceph21][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph21/done [ceph21][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-ceph21.mon.keyring [ceph21][DEBUG ] create the monitor keyring file [ceph21][INFO ] Running command: sudo ceph-mon --cluster ceph --mkfs -i ceph21 --keyring /var/lib/ceph/tmp/ceph-ceph21.mon.keyring --setuser 64045 --setgroup 64045 [ceph21][DEBUG ] ceph-mon: mon.noname-a 192.168.1.2:6789/0 is local, renaming to mon.ceph21 [ceph21][DEBUG ] ceph-mon: set fsid to d841b987-c6d6-4267-923a-2ad3e30a6e9b [ceph21][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ceph21 for mon.ceph21 [ceph21][INFO ] unlinking keyring file /var/lib/ceph/tmp/ceph-ceph21.mon.keyring [ceph21][DEBUG ] create a done file to avoid re-doing the mon deployment [ceph21][DEBUG ] create the init path if it does not exist [ceph21][INFO ] Running command: sudo systemctl enable ceph.target [ceph21][INFO ] Running command: sudo systemctl enable ceph-mon@ceph21 [ceph21][INFO ] Running command: sudo systemctl start ceph-mon@ceph21 [ceph21][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status [ceph21][ERROR ] admin_socket: exception getting command descriptions: [Errno 111] Connection refused [ceph21][WARNIN] monitor: mon.ceph21, might not be running yet [ceph21][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status [ceph21][ERROR ] admin_socket: exception getting command descriptions: [Errno 111] Connection refused [ceph21][WARNIN] monitor ceph21 does not exist in monmap [ceph21][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [ceph21][WARNIN] monitors may not be able to form quorum [ceph_deploy.mon][INFO ] processing monitor mon.ceph21 [ceph21][DEBUG ] connection detected need for sudo [ceph21][DEBUG ] connected to host: ceph21 [ceph21][DEBUG ] detect platform information from remote host [ceph21][DEBUG ] detect machine type [ceph21][DEBUG ] find the location of an executable [ceph21][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status [ceph21][ERROR ] admin_socket: exception getting command descriptions: [Errno 111] Connection refused [ceph_deploy.mon][WARNIN] mon.ceph21 monitor is not yet in quorum, tries left: 5 [ceph_deploy.mon][WARNIN] waiting 5 seconds before retrying [ceph21][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status [ceph21][ERROR ] admin_socket: exception getting command descriptions: [Errno 111] Connection refused [ceph_deploy.mon][WARNIN] mon.ceph21 monitor is not yet in quorum, tries left: 4 [ceph_deploy.mon][WARNIN] waiting 10 seconds before retrying when I just use async posix to deploy a cluster, and then change ceph.conf to use rdma, and then restart ceph-mon , mon crashed. ##first use the ceph.conf to deploy a cluster## [global] fsid = d841b987-c6d6-4267-923a-2ad3e30a6e9b mon_initial_members = ceph21 mon_host = 192.168.1.2 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx ms_type = async osd_crush_chooseleaf_type = 0 osd_pool_default_size = 1 rbd_default_features = 1 ##then change ceph.conf## ms_async_transport_type = rdma ms_async_rdma_device_name = mlx4_0 ms_async_rdma_port_num = 1 ##then restart ceph-mon, and mon crash## 2016-12-27 13:16:00.834116 7fd245de67c0 0 set uid:gid to 64045:64045 (ceph:ceph) 2016-12-27 13:16:00.834136 7fd245de67c0 0 ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755), process ceph-mon, pid 56904 2016-12-27 13:16:00.834202 7fd245de67c0 0 pidfile_write: ignore empty --pid-file 2016-12-27 13:16:00.846997 7fd245de67c0 0 load: jerasure load: lrc load: isa 2016-12-27 13:16:00.847275 7fd245de67c0 1 leveldb: Recovering log #3 2016-12-27 13:16:00.847326 7fd245de67c0 1 leveldb: Level-0 table #5: started 2016-12-27 13:16:00.872185 7fd245de67c0 1 leveldb: Level-0 table #5: 574 bytes OK 2016-12-27 13:16:00.918333 7fd245de67c0 1 leveldb: Delete type=0 #3 2016-12-27 13:16:00.918379 7fd245de67c0 1 leveldb: Delete type=3 #2 2016-12-27 13:16:00.919215 7fd245de67c0 -1 *** Caught signal (Segmentation fault) ** in thread 7fd245de67c0 thread_name:ceph-mon ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755) 1: (()+0x722bc7) [0x7fd24587bbc7] 2: (()+0xf8d0) [0x7fd2439d48d0] 3: (pthread_spin_lock()+0) [0x7fd2439d1bd0] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -32> 2016-12-27 13:16:00.826676 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command perfcounters_dump hook 0x7fd24eac0030 -31> 2016-12-27 13:16:00.826690 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command 1 hook 0x7fd24eac0030 -30> 2016-12-27 13:16:00.826694 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command perf dump hook 0x7fd24eac0030 -29> 2016-12-27 13:16:00.826731 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command perfcounters_schema hook 0x7fd24eac0030 -28> 2016-12-27 13:16:00.826734 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command 2 hook 0x7fd24eac0030 -27> 2016-12-27 13:16:00.826745 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command perf schema hook 0x7fd24eac0030 -26> 2016-12-27 13:16:00.826749 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command perf reset hook 0x7fd24eac0030 -25> 2016-12-27 13:16:00.826757 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command config show hook 0x7fd24eac0030 -24> 2016-12-27 13:16:00.826767 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command config set hook 0x7fd24eac0030 -23> 2016-12-27 13:16:00.826771 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command config get hook 0x7fd24eac0030 -22> 2016-12-27 13:16:00.826774 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command config diff hook 0x7fd24eac0030 -21> 2016-12-27 13:16:00.826784 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command log flush hook 0x7fd24eac0030 -20> 2016-12-27 13:16:00.826787 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command log dump hook 0x7fd24eac0030 -19> 2016-12-27 13:16:00.826796 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command log reopen hook 0x7fd24eac0030 -18> 2016-12-27 13:16:00.826808 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command dump_mempools hook 0x7fd24ebc1548 -17> 2016-12-27 13:16:00.834116 7fd245de67c0 0 set uid:gid to 64045:64045 (ceph:ceph) -16> 2016-12-27 13:16:00.834136 7fd245de67c0 0 ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755), process ceph-mon, pid 56904 -15> 2016-12-27 13:16:00.834202 7fd245de67c0 0 pidfile_write: ignore empty --pid-file -14> 2016-12-27 13:16:00.836477 7fd245de67c0 5 asok(0x7fd24ebfc000) init /var/run/ceph/ceph-mon.ceph21.asok -13> 2016-12-27 13:16:00.836489 7fd245de67c0 5 asok(0x7fd24ebfc000) bind_and_listen /var/run/ceph/ceph-mon.ceph21.asok -12> 2016-12-27 13:16:00.836518 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command 0 hook 0x7fd24eabe0c8 -11> 2016-12-27 13:16:00.836531 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command version hook 0x7fd24eabe0c8 -10> 2016-12-27 13:16:00.836535 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command git_version hook 0x7fd24eabe0c8 -9> 2016-12-27 13:16:00.836545 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command help hook 0x7fd24eac01e0 -8> 2016-12-27 13:16:00.836550 7fd245de67c0 5 asok(0x7fd24ebfc000) register_command get_command_descriptions hook 0x7fd24eac01f0 -7> 2016-12-27 13:16:00.836606 7fd23f576700 5 asok(0x7fd24ebfc000) entry start -6> 2016-12-27 13:16:00.846997 7fd245de67c0 0 load: jerasure load: lrc load: isa -5> 2016-12-27 13:16:00.847275 7fd245de67c0 1 leveldb: Recovering log #3 -4> 2016-12-27 13:16:00.847326 7fd245de67c0 1 leveldb: Level-0 table #5: started -3> 2016-12-27 13:16:00.872185 7fd245de67c0 1 leveldb: Level-0 table #5: 574 bytes OK -2> 2016-12-27 13:16:00.918333 7fd245de67c0 1 leveldb: Delete type=0 #3 -1> 2016-12-27 13:16:00.918379 7fd245de67c0 1 leveldb: Delete type=3 #2 0> 2016-12-27 13:16:00.919215 7fd245de67c0 -1 *** Caught signal (Segmentation fault) ** in thread 7fd245de67c0 thread_name:ceph-mon ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755) 1: (()+0x722bc7) [0x7fd24587bbc7] 2: (()+0xf8d0) [0x7fd2439d48d0] 3: (pthread_spin_lock()+0) [0x7fd2439d1bd0] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2016-12-27 16:52 GMT+08:00 Dong Wu <archer.wudong@xxxxxxxxx>: > HI, Haomai -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html