Re: How to use async with rdma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



resend

On Tue, Dec 27, 2016 at 6:27 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
>
> rdma doesn't support daemonize... we may need to push https://github.com/ceph/ceph/pull/10600 to make it.
>
> if you want to deploy rdma now, you need to do it manually
>
> On Tue, Dec 27, 2016 at 5:02 PM, Dong Wu <archer.wudong@xxxxxxxxx> wrote:
>>
>> Sorry for my mistake to send an empty email before, here is the content:
>>
>> Hi, Haomai
>>     I want to test async messenger with rdma, but when i ceph-deploy a
>> cluster with the following conf, it failed.
>>
>> ##ceph.conf##
>> [global]
>> fsid = d841b987-c6d6-4267-923a-2ad3e30a6e9b
>> mon_initial_members = ceph21
>> mon_host = 192.168.1.2
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>>
>> ms_type = async
>> ms_async_transport_type = rdma
>> ms_async_rdma_device_name = mlx4_0
>> ms_async_rdma_port_num = 1
>>
>> osd_crush_chooseleaf_type = 0
>> osd_pool_default_size = 1
>> rbd_default_features = 1
>>
>> ##here is ceph-deploy failed info##
>> cepher@ceph21:~/ceph_deploy$ ceph-deploy mon create-initial
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /home/cepher/.cephdeploy.conf
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.35): /usr/bin/ceph-deploy mon
>> create-initial
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> [ceph_deploy.cli][INFO  ]  username                      : None
>> [ceph_deploy.cli][INFO  ]  verbose                       : False
>> [ceph_deploy.cli][INFO  ]  overwrite_conf                : False
>> [ceph_deploy.cli][INFO  ]  subcommand                    : create-initial
>> [ceph_deploy.cli][INFO  ]  quiet                         : False
>> [ceph_deploy.cli][INFO  ]  cd_conf                       :
>> <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe9ab01bbd8>
>> [ceph_deploy.cli][INFO  ]  cluster                       : ceph
>> [ceph_deploy.cli][INFO  ]  func                          : <function
>> mon at 0x7fe9ab490140>
>> [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
>> [ceph_deploy.cli][INFO  ]  keyrings                      : None
>> [ceph_deploy.cli][INFO  ]  default_release               : False
>> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph21
>> [ceph_deploy.mon][DEBUG ] detecting platform for host ceph21 ...
>> [ceph21][DEBUG ] connection detected need for sudo
>> [ceph21][DEBUG ] connected to host: ceph21
>> [ceph21][DEBUG ] detect platform information from remote host
>> [ceph21][DEBUG ] detect machine type
>> [ceph21][DEBUG ] find the location of an executable
>> [ceph_deploy.mon][INFO  ] distro info: debian 8.5 jessie
>> [ceph21][DEBUG ] determining if provided host has same hostname in remote
>> [ceph21][DEBUG ] get remote short hostname
>> [ceph21][DEBUG ] deploying mon to ceph21
>> [ceph21][DEBUG ] get remote short hostname
>> [ceph21][DEBUG ] remote hostname: ceph21
>> [ceph21][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>> [ceph21][DEBUG ] create the mon path if it does not exist
>> [ceph21][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph21/done
>> [ceph21][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph21/done
>> [ceph21][INFO  ] creating keyring file:
>> /var/lib/ceph/tmp/ceph-ceph21.mon.keyring
>> [ceph21][DEBUG ] create the monitor keyring file
>> [ceph21][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs
>> -i ceph21 --keyring /var/lib/ceph/tmp/ceph-ceph21.mon.keyring
>> --setuser 64045 --setgroup 64045
>> [ceph21][DEBUG ] ceph-mon: mon.noname-a 192.168.1.2:6789/0 is local,
>> renaming to mon.ceph21
>> [ceph21][DEBUG ] ceph-mon: set fsid to d841b987-c6d6-4267-923a-2ad3e30a6e9b
>> [ceph21][DEBUG ] ceph-mon: created monfs at
>> /var/lib/ceph/mon/ceph-ceph21 for mon.ceph21
>> [ceph21][INFO  ] unlinking keyring file
>> /var/lib/ceph/tmp/ceph-ceph21.mon.keyring
>> [ceph21][DEBUG ] create a done file to avoid re-doing the mon deployment
>> [ceph21][DEBUG ] create the init path if it does not exist
>> [ceph21][INFO  ] Running command: sudo systemctl enable ceph.target
>> [ceph21][INFO  ] Running command: sudo systemctl enable ceph-mon@ceph21
>> [ceph21][INFO  ] Running command: sudo systemctl start ceph-mon@ceph21
>> [ceph21][INFO  ] Running command: sudo ceph --cluster=ceph
>> --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status
>> [ceph21][ERROR ] admin_socket: exception getting command descriptions:
>> [Errno 111] Connection refused
>> [ceph21][WARNIN] monitor: mon.ceph21, might not be running yet
>> [ceph21][INFO  ] Running command: sudo ceph --cluster=ceph
>> --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status
>> [ceph21][ERROR ] admin_socket: exception getting command descriptions:
>> [Errno 111] Connection refused
>> [ceph21][WARNIN] monitor ceph21 does not exist in monmap
>> [ceph21][WARNIN] neither `public_addr` nor `public_network` keys are
>> defined for monitors
>> [ceph21][WARNIN] monitors may not be able to form quorum
>> [ceph_deploy.mon][INFO  ] processing monitor mon.ceph21
>> [ceph21][DEBUG ] connection detected need for sudo
>> [ceph21][DEBUG ] connected to host: ceph21
>> [ceph21][DEBUG ] detect platform information from remote host
>> [ceph21][DEBUG ] detect machine type
>> [ceph21][DEBUG ] find the location of an executable
>> [ceph21][INFO  ] Running command: sudo ceph --cluster=ceph
>> --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status
>> [ceph21][ERROR ] admin_socket: exception getting command descriptions:
>> [Errno 111] Connection refused
>> [ceph_deploy.mon][WARNIN] mon.ceph21 monitor is not yet in quorum, tries left: 5
>> [ceph_deploy.mon][WARNIN] waiting 5 seconds before retrying
>> [ceph21][INFO  ] Running command: sudo ceph --cluster=ceph
>> --admin-daemon /var/run/ceph/ceph-mon.ceph21.asok mon_status
>> [ceph21][ERROR ] admin_socket: exception getting command descriptions:
>> [Errno 111] Connection refused
>> [ceph_deploy.mon][WARNIN] mon.ceph21 monitor is not yet in quorum, tries left: 4
>> [ceph_deploy.mon][WARNIN] waiting 10 seconds before retrying
>>
>>
>> when I just use async posix to deploy a cluster, and then change
>> ceph.conf to use rdma, and then restart ceph-mon , mon crashed.
>> ##first use the ceph.conf to deploy a cluster##
>> [global]
>> fsid = d841b987-c6d6-4267-923a-2ad3e30a6e9b
>> mon_initial_members = ceph21
>> mon_host = 192.168.1.2
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>>
>> ms_type = async
>>
>> osd_crush_chooseleaf_type = 0
>> osd_pool_default_size = 1
>> rbd_default_features = 1
>>
>> ##then change ceph.conf##
>> ms_async_transport_type = rdma
>> ms_async_rdma_device_name = mlx4_0
>> ms_async_rdma_port_num = 1
>>
>> ##then restart ceph-mon, and mon crash##
>> 2016-12-27 13:16:00.834116 7fd245de67c0  0 set uid:gid to 64045:64045
>> (ceph:ceph)
>> 2016-12-27 13:16:00.834136 7fd245de67c0  0 ceph version 11.1.1
>> (87597971b371d7f497d7eabad3545d72d18dd755), process ceph-mon, pid
>> 56904
>> 2016-12-27 13:16:00.834202 7fd245de67c0  0 pidfile_write: ignore empty
>> --pid-file
>> 2016-12-27 13:16:00.846997 7fd245de67c0  0 load: jerasure load: lrc load: isa
>> 2016-12-27 13:16:00.847275 7fd245de67c0  1 leveldb: Recovering log #3
>> 2016-12-27 13:16:00.847326 7fd245de67c0  1 leveldb: Level-0 table #5: started
>> 2016-12-27 13:16:00.872185 7fd245de67c0  1 leveldb: Level-0 table #5:
>> 574 bytes OK
>> 2016-12-27 13:16:00.918333 7fd245de67c0  1 leveldb: Delete type=0 #3
>>
>> 2016-12-27 13:16:00.918379 7fd245de67c0  1 leveldb: Delete type=3 #2
>>
>> 2016-12-27 13:16:00.919215 7fd245de67c0 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7fd245de67c0 thread_name:ceph-mon
>>
>>  ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)
>>  1: (()+0x722bc7) [0x7fd24587bbc7]
>>  2: (()+0xf8d0) [0x7fd2439d48d0]
>>  3: (pthread_spin_lock()+0) [0x7fd2439d1bd0]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- begin dump of recent events ---
>>    -32> 2016-12-27 13:16:00.826676 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command perfcounters_dump hook
>> 0x7fd24eac0030
>>    -31> 2016-12-27 13:16:00.826690 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command 1 hook 0x7fd24eac0030
>>    -30> 2016-12-27 13:16:00.826694 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command perf dump hook 0x7fd24eac0030
>>    -29> 2016-12-27 13:16:00.826731 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command perfcounters_schema hook
>> 0x7fd24eac0030
>>    -28> 2016-12-27 13:16:00.826734 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command 2 hook 0x7fd24eac0030
>>    -27> 2016-12-27 13:16:00.826745 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command perf schema hook 0x7fd24eac0030
>>    -26> 2016-12-27 13:16:00.826749 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command perf reset hook 0x7fd24eac0030
>>    -25> 2016-12-27 13:16:00.826757 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command config show hook 0x7fd24eac0030
>>    -24> 2016-12-27 13:16:00.826767 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command config set hook 0x7fd24eac0030
>>    -23> 2016-12-27 13:16:00.826771 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command config get hook 0x7fd24eac0030
>>    -22> 2016-12-27 13:16:00.826774 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command config diff hook 0x7fd24eac0030
>>    -21> 2016-12-27 13:16:00.826784 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command log flush hook 0x7fd24eac0030
>>    -20> 2016-12-27 13:16:00.826787 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command log dump hook 0x7fd24eac0030
>>    -19> 2016-12-27 13:16:00.826796 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command log reopen hook 0x7fd24eac0030
>>    -18> 2016-12-27 13:16:00.826808 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command dump_mempools hook
>> 0x7fd24ebc1548
>>    -17> 2016-12-27 13:16:00.834116 7fd245de67c0  0 set uid:gid to
>> 64045:64045 (ceph:ceph)
>>    -16> 2016-12-27 13:16:00.834136 7fd245de67c0  0 ceph version 11.1.1
>> (87597971b371d7f497d7eabad3545d72d18dd755), process ceph-mon, pid
>> 56904
>>    -15> 2016-12-27 13:16:00.834202 7fd245de67c0  0 pidfile_write:
>> ignore empty --pid-file
>>    -14> 2016-12-27 13:16:00.836477 7fd245de67c0  5
>> asok(0x7fd24ebfc000) init /var/run/ceph/ceph-mon.ceph21.asok
>>    -13> 2016-12-27 13:16:00.836489 7fd245de67c0  5
>> asok(0x7fd24ebfc000) bind_and_listen
>> /var/run/ceph/ceph-mon.ceph21.asok
>>    -12> 2016-12-27 13:16:00.836518 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command 0 hook 0x7fd24eabe0c8
>>    -11> 2016-12-27 13:16:00.836531 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command version hook 0x7fd24eabe0c8
>>    -10> 2016-12-27 13:16:00.836535 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command git_version hook 0x7fd24eabe0c8
>>     -9> 2016-12-27 13:16:00.836545 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command help hook 0x7fd24eac01e0
>>     -8> 2016-12-27 13:16:00.836550 7fd245de67c0  5
>> asok(0x7fd24ebfc000) register_command get_command_descriptions hook
>> 0x7fd24eac01f0
>>     -7> 2016-12-27 13:16:00.836606 7fd23f576700  5
>> asok(0x7fd24ebfc000) entry start
>>     -6> 2016-12-27 13:16:00.846997 7fd245de67c0  0 load: jerasure
>> load: lrc load: isa
>>     -5> 2016-12-27 13:16:00.847275 7fd245de67c0  1 leveldb: Recovering log #3
>>     -4> 2016-12-27 13:16:00.847326 7fd245de67c0  1 leveldb: Level-0
>> table #5: started
>>     -3> 2016-12-27 13:16:00.872185 7fd245de67c0  1 leveldb: Level-0
>> table #5: 574 bytes OK
>>     -2> 2016-12-27 13:16:00.918333 7fd245de67c0  1 leveldb: Delete type=0 #3
>>
>>     -1> 2016-12-27 13:16:00.918379 7fd245de67c0  1 leveldb: Delete type=3 #2
>>
>>      0> 2016-12-27 13:16:00.919215 7fd245de67c0 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7fd245de67c0 thread_name:ceph-mon
>>
>>  ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)
>>  1: (()+0x722bc7) [0x7fd24587bbc7]
>>  2: (()+0xf8d0) [0x7fd2439d48d0]
>>  3: (pthread_spin_lock()+0) [0x7fd2439d1bd0]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>>
>> 2016-12-27 16:52 GMT+08:00 Dong Wu <archer.wudong@xxxxxxxxx>:
>> > HI, Haomai
>
>
>
>
> --
>
> Best Regards,
>
> Wheat




-- 

Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux