Re: RDMA/RoCE enablement failed with (113) No route to host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Roman.

My RDMA is working correctly, I'm pretty sure of that for two reasons. 

(1) E8 Storage agent running on all OSDs uses RDMA to communicate with our E8 Storage controller and it's working correctly at the moment. The volumes are available and IO can be done at full line rate and expected latency. Had RDMA been not working, the communication b/w OSD server and E8 would suffer first and not go unnoticed by me.
(2) ib_send_bw tests complete correctly for any two nodes in the cluster, in both directions (-b switch) at all block sizes (-a switch)

So I decided to roll back all of my RDMA parameters in ceph.conf  and enable only ms_type = async+rdma.

Following that, I was able to bring up monitors, mds and managers. But OSDs daemons refuse to start when invoked with systemctl start ceph-osd.target

journal shows:

Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@4.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@12.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@1.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@13.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@15.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@11.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@3.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@10.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@6.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@8.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@5.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@2.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@14.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@9.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: ceph-osd@7.service holdoff time over, scheduling restart.
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.7...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.9...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.14...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.2...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.5...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.8...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.6...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.10...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.3...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.11...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.15...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.13...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.1...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.12...
Dec 20 02:12:26 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.4...
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.7.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.9.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.6.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.2.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.14.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.11.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.5.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.8.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.10.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.15.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.13.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.1.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.3.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.12.
Dec 20 02:12:26 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.4.
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 2018-12-20 02:12:27.031 7f0f7864b700 -1 Infiniband binding_port  port not found/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/Infiniband.cc...
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/Infiniband.cc: 146: FAILED assert(active_port)
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/Infiniband.cc: In function 'void Device::binding_port(CephContext*, int)' thread 7f0ab407c70...
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/Infiniband.cc: 146: FAILED assert(active_port)
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 2018-12-20 02:12:27.032 7f0ab407c700 -1 Infiniband binding_port  port not found
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f0f809476bf]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 2: (()+0x285887) [0x7f0f80947887]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 3: (Device::binding_port(CephContext*, int)+0x1c5) [0x7f0f80a89d65]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 4: (Infiniband::init()+0x159) [0x7f0f80a8e4c9]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 5: (RDMAWorker::connect(entity_addr_t const&, SocketOptions const&, ConnectedSocket*)+0x2a) [0x7f0f80a9462a]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 6: (AsyncConnection::_process_connection()+0x2fd) [0x7f0f80a6bcfd]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 7: (AsyncConnection::process()+0x640) [0x7f0f80a6ffb0]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 8: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa4f) [0x7f0f80a80aaf]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 9: (()+0x3c15cc) [0x7f0f80a835cc]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 10: (()+0x6afaef) [0x7f0f80d71aef]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 11: (()+0x7e25) [0x7f0f7d4a0e25]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: 12: (clone()+0x6d) [0x7f0f7c590bad]
Dec 20 02:12:27 bonjovi0 ceph-osd[825171]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f0abbb776bf]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 2: (()+0x285887) [0x7f0abbb77887]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 3: (Device::binding_port(CephContext*, int)+0x1c5) [0x7f0abbcb9d65]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 4: (Infiniband::init()+0x159) [0x7f0abbcbe4c9]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 5: (RDMAWorker::connect(entity_addr_t const&, SocketOptions const&, ConnectedSocket*)+0x2a) [0x7f0abbcc462a]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 6: (AsyncConnection::_process_connection()+0x2fd) [0x7f0abbc9bcfd]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 7: (AsyncConnection::process()+0x640) [0x7f0abbc9ffb0]
Dec 20 02:12:27 bonjovi0 ceph-osd[825159]: 8: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa4f) [0x7f0abbcb0aaf]

and so on... the blocks of such entries repeat for each ceph-osd process.

So the error was "Infiniband binding_port  port not found" and I figured that maybe uncommenting #ms_async_rdma_device_name = mlx5_0 would help.
I did that followed by
systemctl stop ceph-osd.target; for i in {0..15}; do systemctl reset-failed ceph-osd@${i}.service; done;

followed by 
systemctl start ceph-osd.target

The processes started and immediately failed, but now with:

Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.3...                                                                                                                                                                                                                                                  [586/1687]
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.7...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.1...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.11...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.13...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.2...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.0...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.8...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.4...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.9...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.5...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.14...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.10...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.15...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.6...
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting Ceph object storage daemon osd.12...
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.13.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.3.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.1.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.7.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.2.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.0.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.11.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.9.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.8.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.4.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.5.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.14.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.10.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.6.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.12.
Dec 20 02:26:59 bonjovi0 systemd[1]: Started Ceph object storage daemon osd.15.
Dec 20 02:26:59 bonjovi0 systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.
Dec 20 02:26:59 bonjovi0 systemd[1]: Starting ceph target allowing to start/stop all ceph-osd@.service instances at once.
Dec 20 02:26:59 bonjovi0 polkitd[1394]: Unregistered Authentication Agent for unix-process:837397:15266286 (system bus name :1.142040, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Dec 20 02:27:00 bonjovi0 ceph-osd[837486]: tcmalloc: large alloc 1074012160 bytes == 0x559ca561a000 @  0x7fba14b334ef 0x7fba14b52e76 0x7fba157341a9 0x7fba15736d4b 0x7fba15736138 0x7fba15736229 0x7fba157367f9 0x7fba1573c62a 0x7fba15713cfd 0x7fba15717fb0 0x7fba15728aaf 0x7fba1572b5cc 0x7fba15a19aef 0x7fba12148e25 0x7fba11238bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: tcmalloc: large alloc 1074012160 bytes == 0x55c396f9e000 @  0x7f9abaad74ef 0x7f9abaaf6e76 0x7f9abb6d81a9 0x7f9abb6dad4b 0x7f9abb6da138 0x7f9abb6da229 0x7f9abb6da7f9 0x7f9abb6e062a 0x7f9abb6b7cfd 0x7f9abb6bbfb0 0x7f9abb6ccaaf 0x7f9abb6cf5cc 0x7f9abb9bdaef 0x7f9ab80ece25 0x7f9ab71dcbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837465]: tcmalloc: large alloc 1074012160 bytes == 0x55cd895ba000 @  0x7f4796ee64ef 0x7f4796f05e76 0x7f4797ae71a9 0x7f4797ae9d4b 0x7f4797ae9138 0x7f4797ae9229 0x7f4797ae97f9 0x7f4797aef62a 0x7f4797ac6cfd 0x7f4797acafb0 0x7f4797adbaaf 0x7f4797ade5cc 0x7f4797dccaef 0x7f47944fbe25 0x7f47935ebbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837483]: tcmalloc: large alloc 1074012160 bytes == 0x5571605c2000 @  0x7ff38ceab4ef 0x7ff38cecae76 0x7ff38daac1a9 0x7ff38daaed4b 0x7ff38daae138 0x7ff38daae229 0x7ff38daae7f9 0x7ff38dab462a 0x7ff38da8bcfd 0x7ff38da8ffb0 0x7ff38daa0aaf 0x7ff38daa35cc 0x7ff38dd91aef 0x7ff38a4c0e25 0x7ff3895b0bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837497]: tcmalloc: large alloc 1074012160 bytes == 0x5634f13ae000 @  0x7f170a3974ef 0x7f170a3b6e76 0x7f170af981a9 0x7f170af9ad4b 0x7f170af9a138 0x7f170af9a229 0x7f170af9a7f9 0x7f170afa062a 0x7f170af77cfd 0x7f170af7bfb0 0x7f170af8caaf 0x7f170af8f5cc 0x7f170b27daef 0x7f17079ace25 0x7f1706a9cbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837500]: tcmalloc: large alloc 1074012160 bytes == 0x55e7e1f60000 @  0x7fc633a4a4ef 0x7fc633a69e76 0x7fc63464b1a9 0x7fc63464dd4b 0x7fc63464d138 0x7fc63464d229 0x7fc63464d7f9 0x7fc63465362a 0x7fc63462acfd 0x7fc63462efb0 0x7fc63463faaf 0x7fc6346425cc 0x7fc634930aef 0x7fc63105fe25 0x7fc63014fbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837479]: tcmalloc: large alloc 1074012160 bytes == 0x5624de108000 @  0x7fb88d7a84ef 0x7fb88d7c7e76 0x7fb88e3a91a9 0x7fb88e3abd4b 0x7fb88e3ab138 0x7fb88e3ab229 0x7fb88e3ab7f9 0x7fb88e3b162a 0x7fb88e388cfd 0x7fb88e38cfb0 0x7fb88e39daaf 0x7fb88e3a05cc 0x7fb88e68eaef 0x7fb88adbde25 0x7fb889eadbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837491]: tcmalloc: large alloc 1074012160 bytes == 0x55dd784d0000 @  0x7ffbc409e4ef 0x7ffbc40bde76 0x7ffbc4c9f1a9 0x7ffbc4ca1d4b 0x7ffbc4ca1138 0x7ffbc4ca1229 0x7ffbc4ca17f9 0x7ffbc4ca762a 0x7ffbc4c7ecfd 0x7ffbc4c82fb0 0x7ffbc4c93aaf 0x7ffbc4c965cc 0x7ffbc4f84aef 0x7ffbc16b3e25 0x7ffbc07a3bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837494]: tcmalloc: large alloc 1074012160 bytes == 0x562a24d20000 @  0x7fab520dc4ef 0x7fab520fbe76 0x7fab52cdd1a9 0x7fab52cdfd4b 0x7fab52cdf138 0x7fab52cdf229 0x7fab52cdf7f9 0x7fab52ce562a 0x7fab52cbccfd 0x7fab52cc0fb0 0x7fab52cd1aaf 0x7fab52cd45cc 0x7fab52fc2aef 0x7fab4f6f1e25 0x7fab4e7e1bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837493]: tcmalloc: large alloc 1074012160 bytes == 0x56256798e000 @  0x7f77489064ef 0x7f7748925e76 0x7f77495071a9 0x7f7749509d4b 0x7f7749509138 0x7f7749509229 0x7f77495097f9 0x7f774950f62a 0x7f77494e6cfd 0x7f77494eafb0 0x7f77494fbaaf 0x7f77494fe5cc 0x7f77497ecaef 0x7f7745f1be25 0x7f774500bbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837489]: tcmalloc: large alloc 1074012160 bytes == 0x55ed3aeae000 @  0x7fc92a1644ef 0x7fc92a183e76 0x7fc92ad651a9 0x7fc92ad67d4b 0x7fc92ad67138 0x7fc92ad67229 0x7fc92ad677f9 0x7fc92ad6d62a 0x7fc92ad44cfd 0x7fc92ad48fb0 0x7fc92ad59aaf 0x7fc92ad5c5cc 0x7fc92b04aaef 0x7fc927779e25 0x7fc926869bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837503]: tcmalloc: large alloc 1074012160 bytes == 0x55c1b7fe6000 @  0x7ffb86dd84ef 0x7ffb86df7e76 0x7ffb879d91a9 0x7ffb879dbd4b 0x7ffb879db138 0x7ffb879db229 0x7ffb879db7f9 0x7ffb879e162a 0x7ffb879b8cfd 0x7ffb879bcfb0 0x7ffb879cdaaf 0x7ffb879d05cc 0x7ffb87cbeaef 0x7ffb843ede25 0x7ffb834ddbad
Dec 20 02:27:00 bonjovi0 ceph-osd[837490]: tcmalloc: large alloc 1074012160 bytes == 0x559272eb6000 @  0x7f64468124ef 0x7f6446831e76 0x7f64474131a9 0x7f6447415d4b 0x7f6447415138 0x7f6447415229 0x7f64474157f9 0x7f644741b62a 0x7f64473f2cfd 0x7f64473f6fb0 0x7f6447407aaf 0x7f644740a5cc 0x7f64476f8aef 0x7f6443e27e25 0x7f6442f17bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837473]: tcmalloc: large alloc 1074012160 bytes == 0x56325cc7c000 @  0x7f0ab0d804ef 0x7f0ab0d9fe76 0x7f0ab19811a9 0x7f0ab1983d4b 0x7f0ab1983138 0x7f0ab1983229 0x7f0ab19837f9 0x7f0ab198962a 0x7f0ab1960cfd 0x7f0ab1964fb0 0x7f0ab1975aaf 0x7f0ab19785cc 0x7f0ab1c66aef 0x7f0aae395e25 0x7f0aad485bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837481]: tcmalloc: large alloc 1074012160 bytes == 0x562bd82ac000 @  0x7fbaf08094ef 0x7fbaf0828e76 0x7fbaf140a1a9 0x7fbaf140cd4b 0x7fbaf140c138 0x7fbaf140c229 0x7fbaf140c7f9 0x7fbaf141262a 0x7fbaf13e9cfd 0x7fbaf13edfb0 0x7fbaf13feaaf 0x7fbaf14015cc 0x7fbaf16efaef 0x7fbaede1ee25 0x7fbaecf0ebad
Dec 20 02:27:00 bonjovi0 ceph-osd[837495]: tcmalloc: large alloc 1074012160 bytes == 0x55dbd221c000 @  0x7ff9af95e4ef 0x7ff9af97de76 0x7ff9b055f1a9 0x7ff9b0561d4b 0x7ff9b0561138 0x7ff9b0561229 0x7ff9b05617f9 0x7ff9b056762a 0x7ff9b053ecfd 0x7ff9b0542fb0 0x7ff9b0553aaf 0x7ff9b05565cc 0x7ff9b0844aef 0x7ff9acf73e25 0x7ff9ac063bad
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: *** Caught signal (Aborted) **
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: in thread 7f9ab0202700 thread_name:rdma-polling
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 2018-12-20 02:27:00.342 7f9ab0202700 -1 RDMAStack polling poll failed -4
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 1: (()+0x902970) [0x55c38d155970]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 2: (()+0xf6d0) [0x7f9ab80f46d0]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 3: (gsignal()+0x37) [0x7f9ab7114277]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 4: (abort()+0x148) [0x7f9ab7115968]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 5: (RDMADispatcher::polling()+0x1084) [0x7f9abb6e4c14]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 6: (()+0x6afaef) [0x7f9abb9bdaef]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 7: (()+0x7e25) [0x7f9ab80ece25]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 8: (clone()+0x6d) [0x7f9ab71dcbad]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 2018-12-20 02:27:00.343 7f9ab0202700 -1 *** Caught signal (Aborted) **
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: in thread 7f9ab0202700 thread_name:rdma-polling
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 1: (()+0x902970) [0x55c38d155970]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 2: (()+0xf6d0) [0x7f9ab80f46d0]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 3: (gsignal()+0x37) [0x7f9ab7114277]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 4: (abort()+0x148) [0x7f9ab7115968]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 5: (RDMADispatcher::polling()+0x1084) [0x7f9abb6e4c14]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 6: (()+0x6afaef) [0x7f9abb9bdaef]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 7: (()+0x7e25) [0x7f9ab80ece25]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 8: (clone()+0x6d) [0x7f9ab71dcbad]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: -4> 2018-12-20 02:27:00.342 7f9ab0202700 -1 RDMAStack polling poll failed -4
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 0> 2018-12-20 02:27:00.343 7f9ab0202700 -1 *** Caught signal (Aborted) **
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: in thread 7f9ab0202700 thread_name:rdma-polling
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 1: (()+0x902970) [0x55c38d155970]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 2: (()+0xf6d0) [0x7f9ab80f46d0]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 3: (gsignal()+0x37) [0x7f9ab7114277]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 4: (abort()+0x148) [0x7f9ab7115968]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 5: (RDMADispatcher::polling()+0x1084) [0x7f9abb6e4c14]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 6: (()+0x6afaef) [0x7f9abb9bdaef]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 7: (()+0x7e25) [0x7f9ab80ece25]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 8: (clone()+0x6d) [0x7f9ab71dcbad]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The above block repeats for each OSD.

Any advice where to go from here will be much appreciated.
--
Michael Green
Customer Support & Integration
Tel. +1 (518) 9862385
green@xxxxxxxxxxxxx

E8 Storage has a new look, find out more 










On Dec 19, 2018, at 3:26 PM, Roman Penyaev <rpenyaev@xxxxxxx> wrote:

On 2018-12-19 21:00, Michael Green wrote:
Thanks for the insights Mohammad and Roman. Interesting read.
My interest in RDMA is purely from testing perspective.
Still I would be interested if somebody who has RDMA enabled and
running, to share their ceph.conf.

Nothing special in my ceph.conf, only one line ms_cluster_type = async+rdma

My RDMA related entries are taken from Mellanox blog here
https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide
<https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide>.
They used Luminous and built it from source. I'm running binary
distribution of Mimic here.
ms_type = async+rdma
ms_cluster = async+rdma
ms_async_rdma_device_name = mlx5_0
ms_async_rdma_polling_us = 0
ms_async_rdma_local_gid=<node's_gid>


ms_type = async+rdma should be enough, or ms_cluster_type=async+rdma,
i.e. all osds will be connected over rdma, but public network stays on
tcp sockets.

Others are optional.

Or, if somebody with knowledge of the code could tell me when is this
"RDMAConnectedSocketImpl" error is printed might also be helpful.
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
crush map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
crush map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
crush map has features 1009089991638532096, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
crush map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:33.138 7f52b8548140  0 mon.rio@-1(probing) e5  my
rank is now 0 (was -1)
2018-12-19 21:45:33.141 7f529f3fe700 -1  RDMAConnectedSocketImpl
activate failed to transition to RTR state: (113) No route to host

The error means: no route to host :)  peers do not see each other,
I suggest first try to install (or build) perftest and run ib_send_bw
to test connectivity between client and server.  Also there are some
testing examples from libuverbs (rdma-core), e.g. ibv_rc_pingpong,
also good for benchmarking, testing, etc.

--
Roman



2018-12-19 21:45:33.142 7f529f3fe700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
In function 'void RDMAConnectedSocketImpl::handle_connection()' thread
7f529f3fe700 time 2018-12-19 21:45:33.141972
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
224: FAILED assert(!r)
--
Michael Green
On Dec 19, 2018, at 5:21 AM, Roman Penyaev <rpenyaev@xxxxxxx> wrote:
Well, I am playing with ceph rdma implementation quite a while
and it has unsolved problems, thus I would say the status is
"not completely broken", but "you can run it on your own risk
and smile":
1. On disconnect of previously active (high write load) connection
 there is a race that can lead to osd (or any receiver) crash:
 https://github.com/ceph/ceph/pull/25447 <https://github.com/ceph/ceph/pull/25447>
2. Recent qlogic hardware (qedr drivers) does not support
 IBV_EVENT_QP_LAST_WQE_REACHED, which is used in ceph rdma
 implementation, pull request from 1. also targets this
 incompatibility.
3. On high write load and many connections there is a chance,
 that osd can run out of receive WRs and rdma connection (QP)
 on sender side will get IBV_WC_RETRY_EXC_ERR, thus disconnected.
 This is fundamental design problem, which has to be fixed on
 protocol level (e.g. propagate backpressure to senders).
4. Unfortunately neither rdma or any other 0-latency network can
 bring significant value, because the bottle neck is not a
 network, please consider this for further reading regarding
 transport performance in ceph:
 https://www.spinics.net/lists/ceph-devel/msg43555.html <https://www.spinics.net/lists/ceph-devel/msg43555.html>
 Problems described above have quite a big impact on overall
 transport performance.
--
Roman


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux