Answering the first RDMA question myself... Am 18.02.2018 um 16:45 schrieb Oliver Freyermuth: > This leaves me with two questions: > - Is it safe to use RDMA with 12.2.2 already? Reading through this mail archive, > I grasped it may lead to memory exhaustion and in any case needs some hacks to the systemd service files. I tried that on our cluster and while I had a running cluster for a few minutes, I ran into many random disconnects, mons and mgrs disconnecting, osds vanishing, no client being able to connect... I got the very same issues described here: https://tracker.ceph.com/issues/22944 I'm also on CentOS 7.4, with Connect-X3 cards, but was not using a modern Mellanox OFED, but the stack that came with CentOS 7.4. Hence, I reverted to IPoIB. However, I got a significant performance improvement (> 2x) by switching to mode "connected" and MTU 65520 instead of mode "datagram" and MTU 2044 as outlined e.g. here: https://wiki.gentoo.org/wiki/InfiniBand#Performance_tuning Total throughput in iperf (send + recv) is now about 30 GBit/s. Even though this is not "perfect" (harddrives are a bit bored...), it's sufficient for our usecase and runs very stable. I'll try some sysctl tuning in the next days. > - Is it already clear whether RDMA will be part of 12.2.3? > > Also, of course the final question from the last mail: > "Why is data moved in a k=4 m=2 EC-pool with 6 hosts and failure domain "host" after failure of one host?" > is still open. > > Many thanks already, this helped a lot to understand things better! > > Cheers, > Oliver >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com