Hello, Phil!
Never tried ConnectX-2 and "repository" software versions but my setup feels pretty good with Mellanox OFED. AFAIK the latest OFED version (4.x) has dropped Connect-X2 support but you can try 3.4 version.
Actually according to my notes all went pretty well without any issues.
Never tried ConnectX-2 and "repository" software versions but my setup feels pretty good with Mellanox OFED. AFAIK the latest OFED version (4.x) has dropped Connect-X2 support but you can try 3.4 version.
Actually according to my notes all went pretty well without any issues.
Any dmesg or syslog messages/issues? Distribution\kernel versions?
Configured opensm, I have a number of partitions to isolate different proposed traffic:
What have I done (on Ubuntu 16.04):
Installed Mellanox OFED (it has an automated installed, just run it from ISO; if you have the most recent Linux distribution you'll probably need to turn off version check with an appropriate installer option).
Put IPoIB into connected mode (it's in the datagramm mode by default) [i believe this might be the case]:
sudo sed -i -e 's/SET_IPOIB_CM=auto/SET_IPOIB_CM=yes/g' /etc/infiniband/openib.conf
Configured opensm, I have a number of partitions to isolate different proposed traffic:
cat << 'EOF' | sudo tee /etc/opensm/partitions.conf
# For reference:
# IPv4 IANA reserved multicast addresses:
# http://www.iana.org/assignments/multicast-addresses/multicast-addresses.txt
# IPv6 IANA reserved multicast addresses:
# http://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml
#
# mtu =
# 1 = 256
# 2 = 512
# 3 = 1024
# 4 = 2048
# 5 = 4096
#
# rate =
# 2 = 2.5 GBit/s
# 3 = 10 GBit/s
# 4 = 30 GBit/s
# 5 = 5 GBit/s
# 6 = 20 GBit/s
# 7 = 40 GBit/s
# 8 = 60 GBit/s
# 9 = 80 GBit/s
# 10 = 120 GBit/s
Default=0x7fff, rate=7, mtu=4, scope=2, defmember=full:
ALL, ALL_SWITCHES=full;
Default=0x7fff, ipoib, rate=7, mtu=4, scope=2:
mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address
mgid=ff12:401b::1 # IPv4 All Hosts group
mgid=ff12:401b::2 # IPv4 All Routers group
mgid=ff12:401b::16 # IPv4 IGMP group
mgid=ff12:401b::fb # IPv4 mDNS group
mgid=ff12:401b::fc # IPv4 Multicast Link Local Name Resolution group
mgid=ff12:401b::101 # IPv4 NTP group
mgid=ff12:401b::202 # IPv4 Sun RPC
mgid=ff12:601b::1 # IPv6 All Hosts group
mgid=ff12:601b::2 # IPv6 All Routers group
mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group
mgid=ff12:601b::fb # IPv6 mDNS group
mgid=ff12:601b::101 # IPv6 NTP group
mgid=ff12:601b::202 # IPv6 Sun RPC group
mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name Resolution group
ALL=full, ALL_SWITCHES=full;
Public=0x0003, rate=7, mtu=4, scope=2, defmember=full:
ALL, ALL_SWITCHES=full;
Public=0x0003, ipoib, rate=7, mtu=4, scope=2:
mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address
mgid=ff12:401b::1 # IPv4 All Hosts group
mgid=ff12:401b::2 # IPv4 All Routers group
mgid=ff12:401b::16 # IPv4 IGMP group
mgid=ff12:401b::fb # IPv4 mDNS group
mgid=ff12:401b::fc # IPv4 Multicast Link Local Name Resolution group
mgid=ff12:401b::101 # IPv4 NTP group
mgid=ff12:401b::202 # IPv4 Sun RPC
mgid=ff12:601b::1 # IPv6 All Hosts group
mgid=ff12:601b::2 # IPv6 All Routers group
mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group
mgid=ff12:601b::fb # IPv6 mDNS group
mgid=ff12:601b::101 # IPv6 NTP group
mgid=ff12:601b::202 # IPv6 Sun RPC group
mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name Resolution group
ALL=full, ALL_SWITCHES=full;
Storage=0x0004, rate=7, mtu=4, scope=2, defmember=full:
ALL, ALL_SWITCHES=full;
Storage=0x0004, ipoib, rate=7, mtu=4, scope=2:
mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address
mgid=ff12:401b::1 # IPv4 All Hosts group
mgid=ff12:401b::2 # IPv4 All Routers group
mgid=ff12:401b::16 # IPv4 IGMP group
mgid=ff12:401b::fb # IPv4 mDNS group
mgid=ff12:401b::fc # IPv4 Multicast Link Local Name Resolution group
mgid=ff12:401b::101 # IPv4 NTP group
mgid=ff12:401b::202 # IPv4 Sun RPC
mgid=ff12:601b::1 # IPv6 All Hosts group
mgid=ff12:601b::2 # IPv6 All Routers group
mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group
mgid=ff12:601b::fb # IPv6 mDNS group
mgid=ff12:601b::101 # IPv6 NTP group
mgid=ff12:601b::202 # IPv6 Sun RPC group
mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name Resolution group
ALL=full, ALL_SWITCHES=full;
Storage=0x0005, rate=7, mtu=4, scope=2, defmember=full:
ALL, ALL_SWITCHES=full;
Storage=0x0005, ipoib, rate=7, mtu=4, scope=2:
mgid=ff12:401b::ffff:ffff # IPv4 Broadcast address
mgid=ff12:401b::1 # IPv4 All Hosts group
mgid=ff12:401b::2 # IPv4 All Routers group
mgid=ff12:401b::16 # IPv4 IGMP group
mgid=ff12:401b::fb # IPv4 mDNS group
mgid=ff12:401b::fc # IPv4 Multicast Link Local Name Resolution group
mgid=ff12:401b::101 # IPv4 NTP group
mgid=ff12:401b::202 # IPv4 Sun RPC
mgid=ff12:601b::1 # IPv6 All Hosts group
mgid=ff12:601b::2 # IPv6 All Routers group
mgid=ff12:601b::16 # IPv6 MLDv2-capable Routers group
mgid=ff12:601b::fb # IPv6 mDNS group
mgid=ff12:601b::101 # IPv6 NTP group
mgid=ff12:601b::202 # IPv6 Sun RPC group
mgid=ff12:601b::1:3 # IPv6 Multicast Link Local Name Resolution group
ALL=full, ALL_SWITCHES=full;
EOF
I believe in your case you need just the first block (default partition, with key: 0x7fff). Also check rate id, I have QDR IB, so it's 7 (40Gbit\s)
Enabled OpenSM (but you've already done if you are able to ibping nodes by GUIDs).
after that set IP addresses, in my case it's done like this (for every partition\VLAN):
cat << 'EOF' | sudo tee /etc/network/interfaces.d/ib0.8003
auto ib0.8003
iface ib0.8003 inet static
address 10.103.0.XXX
netmask 255.255.0.0
post-up ifconfig $IFACE mtu 65520
EOF
reboot the host and after that:
admin@e001n01:~$ ping -c 5 10.101.0.2
PING 10.101.0.2 (10.101.0.2) 56(84) bytes of data.
64 bytes from 10.101.0.2: icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from 10.101.0.2: icmp_seq=2 ttl=64 time=0.156 ms
64 bytes from 10.101.0.2: icmp_seq=3 ttl=64 time=0.139 ms
64 bytes from 10.101.0.2: icmp_seq=4 ttl=64 time=0.146 ms
64 bytes from 10.101.0.2: icmp_seq=5 ttl=64 time=0.140 ms
--- 10.101.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4072ms
rtt min/avg/max/mdev = 0.138/0.143/0.156/0.016 ms
2017-12-19 12:35 GMT+05:00 Phil Schwarz <infolist@xxxxxxxxxxxxxx>:
Hi,
I'm currently trying to set up a brand new home cluster :
- 5 nodes, with each :
- 1 HCA Mellanox ConnectX-2
- 1 GB Ethernet (Proxmox 5.1 Network Admin)
- 1 CX4 to CX4 cable
All together connected to a SDR Flextronics IB Switch.
This setup should back a Ceph Luminous (V12.2.2 included in proxmox
V5.1) On all nodes, I did:
- apt-get infiniband-diags
- modprobe mlx4_ib
- modprobe ib_ipoib
- modprobe ib_umad
- ifconfig ib0 IP/MASK
On two nodes (tried previously on a single on, same issue), i installed
opensm ( The switch doesn't have SM included) :
apt-get install opensm
/etc/init.d/opensm stop
/etc/init.d/opensm start
(Necessary to let the daemon create the logfiles)
I tailed the logfile and got a "Active&Running" Setup, with "SUBNET UP"
Every node is OK regardless to IB Setup :
- All ib0 are UP, using ibstat
- ibhosts and ibswitches seem to be OK
On a node :
ibping -S
On every other node :
ibping -G GID_Of_Previous_Server_Port
I got a nice pong reply on every node. Should be happy, but...
But i never went further.. Tried to ping each other. No way to get into
this (mostly probably) simple issue...
Any hint to achieve this task ??
Thanks for all
Best regards
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192
ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com