Re: Ceph over IP over Infiniband

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Phil!

Never tried ConnectX-2 and "repository" software versions but my setup feels pretty good with Mellanox OFED. AFAIK the latest OFED version (4.x) has dropped Connect-X2 support but you can try 3.4 version. 
Actually according to my notes all went pretty well without any issues.

Any dmesg or syslog messages/issues? Distribution\kernel versions?

What have I done (on Ubuntu 16.04):

Installed Mellanox OFED (it has an automated installed, just run it from ISO; if you have the most recent Linux distribution you'll probably need to turn off version check with an appropriate installer option).

Put IPoIB into connected mode (it's in the datagramm mode by default) [i believe this might be the case]:

sudo sed -i -e 's/SET_IPOIB_CM=auto/SET_IPOIB_CM=yes/g' /etc/infiniband/openib.conf

Configured opensm, I have a number of partitions to isolate different proposed traffic:

cat << 'EOF' | sudo tee /etc/opensm/partitions.conf
# For reference:
# IPv4 IANA reserved multicast addresses:
#   http://www.iana.org/assignments/multicast-addresses/multicast-addresses.txt
# IPv6 IANA reserved multicast addresses:
#   http://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml
#
# mtu =
#   1 = 256
#   2 = 512
#   3 = 1024
#   4 = 2048
#   5 = 4096
#
# rate =
#   2  =   2.5 GBit/s
#   3  =  10   GBit/s
#   4  =  30   GBit/s
#   5  =   5   GBit/s
#   6  =  20   GBit/s
#   7  =  40   GBit/s
#   8  =  60   GBit/s
#   9  =  80   GBit/s
#   10 = 120   GBit/s
Default=0x7fff, rate=7, mtu=4, scope=2, defmember=full:
        ALL, ALL_SWITCHES=full;
Default=0x7fff, ipoib, rate=7, mtu=4, scope=2:
        mgid=ff12:401b::ffff:ffff       # IPv4 Broadcast address
        mgid=ff12:401b::1               # IPv4 All Hosts group
        mgid=ff12:401b::2               # IPv4 All Routers group
        mgid=ff12:401b::16              # IPv4 IGMP group
        mgid=ff12:401b::fb              # IPv4 mDNS group
        mgid=ff12:401b::fc              # IPv4 Multicast Link Local Name Resolution group
        mgid=ff12:401b::101             # IPv4 NTP group
        mgid=ff12:401b::202             # IPv4 Sun RPC
        mgid=ff12:601b::1               # IPv6 All Hosts group
        mgid=ff12:601b::2               # IPv6 All Routers group
        mgid=ff12:601b::16              # IPv6 MLDv2-capable Routers group
        mgid=ff12:601b::fb              # IPv6 mDNS group
        mgid=ff12:601b::101             # IPv6 NTP group
        mgid=ff12:601b::202             # IPv6 Sun RPC group
        mgid=ff12:601b::1:3             # IPv6 Multicast Link Local Name Resolution group
        ALL=full, ALL_SWITCHES=full;
Public=0x0003, rate=7, mtu=4, scope=2, defmember=full:
        ALL, ALL_SWITCHES=full;
Public=0x0003, ipoib, rate=7, mtu=4, scope=2:
        mgid=ff12:401b::ffff:ffff       # IPv4 Broadcast address
        mgid=ff12:401b::1               # IPv4 All Hosts group
        mgid=ff12:401b::2               # IPv4 All Routers group
        mgid=ff12:401b::16              # IPv4 IGMP group
        mgid=ff12:401b::fb              # IPv4 mDNS group
        mgid=ff12:401b::fc              # IPv4 Multicast Link Local Name Resolution group
        mgid=ff12:401b::101             # IPv4 NTP group
        mgid=ff12:401b::202             # IPv4 Sun RPC
        mgid=ff12:601b::1               # IPv6 All Hosts group
        mgid=ff12:601b::2               # IPv6 All Routers group
        mgid=ff12:601b::16              # IPv6 MLDv2-capable Routers group
        mgid=ff12:601b::fb              # IPv6 mDNS group
        mgid=ff12:601b::101             # IPv6 NTP group
        mgid=ff12:601b::202             # IPv6 Sun RPC group
        mgid=ff12:601b::1:3             # IPv6 Multicast Link Local Name Resolution group
        ALL=full, ALL_SWITCHES=full;
Storage=0x0004, rate=7, mtu=4, scope=2, defmember=full:
        ALL, ALL_SWITCHES=full;
Storage=0x0004, ipoib, rate=7, mtu=4, scope=2:
        mgid=ff12:401b::ffff:ffff       # IPv4 Broadcast address
        mgid=ff12:401b::1               # IPv4 All Hosts group
        mgid=ff12:401b::2               # IPv4 All Routers group
        mgid=ff12:401b::16              # IPv4 IGMP group
        mgid=ff12:401b::fb              # IPv4 mDNS group
        mgid=ff12:401b::fc              # IPv4 Multicast Link Local Name Resolution group
        mgid=ff12:401b::101             # IPv4 NTP group
        mgid=ff12:401b::202             # IPv4 Sun RPC
        mgid=ff12:601b::1               # IPv6 All Hosts group
        mgid=ff12:601b::2               # IPv6 All Routers group
        mgid=ff12:601b::16              # IPv6 MLDv2-capable Routers group
        mgid=ff12:601b::fb              # IPv6 mDNS group
        mgid=ff12:601b::101             # IPv6 NTP group
        mgid=ff12:601b::202             # IPv6 Sun RPC group
        mgid=ff12:601b::1:3             # IPv6 Multicast Link Local Name Resolution group
        ALL=full, ALL_SWITCHES=full;
Storage=0x0005, rate=7, mtu=4, scope=2, defmember=full:
        ALL, ALL_SWITCHES=full;
Storage=0x0005, ipoib, rate=7, mtu=4, scope=2:
        mgid=ff12:401b::ffff:ffff       # IPv4 Broadcast address
        mgid=ff12:401b::1               # IPv4 All Hosts group
        mgid=ff12:401b::2               # IPv4 All Routers group
        mgid=ff12:401b::16              # IPv4 IGMP group
        mgid=ff12:401b::fb              # IPv4 mDNS group
        mgid=ff12:401b::fc              # IPv4 Multicast Link Local Name Resolution group
        mgid=ff12:401b::101             # IPv4 NTP group
        mgid=ff12:401b::202             # IPv4 Sun RPC
        mgid=ff12:601b::1               # IPv6 All Hosts group
        mgid=ff12:601b::2               # IPv6 All Routers group
        mgid=ff12:601b::16              # IPv6 MLDv2-capable Routers group
        mgid=ff12:601b::fb              # IPv6 mDNS group
        mgid=ff12:601b::101             # IPv6 NTP group
        mgid=ff12:601b::202             # IPv6 Sun RPC group
        mgid=ff12:601b::1:3             # IPv6 Multicast Link Local Name Resolution group
        ALL=full, ALL_SWITCHES=full;
EOF

I believe in your case you need just the first block (default partition, with key: 0x7fff). Also check rate id, I have QDR IB, so it's 7 (40Gbit\s)
Enabled OpenSM (but you've already done if you are able to ibping nodes by GUIDs).

after that set IP addresses, in my case it's done like this (for every partition\VLAN):
cat << 'EOF' | sudo tee /etc/network/interfaces.d/ib0.8003
auto ib0.8003
iface ib0.8003 inet static
  address 10.103.0.XXX
  netmask 255.255.0.0
  post-up ifconfig $IFACE mtu 65520
EOF

reboot the host and after that:
admin@e001n01:~$ ping -c 5 10.101.0.2
PING 10.101.0.2 (10.101.0.2) 56(84) bytes of data.
64 bytes from 10.101.0.2: icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from 10.101.0.2: icmp_seq=2 ttl=64 time=0.156 ms
64 bytes from 10.101.0.2: icmp_seq=3 ttl=64 time=0.139 ms
64 bytes from 10.101.0.2: icmp_seq=4 ttl=64 time=0.146 ms
64 bytes from 10.101.0.2: icmp_seq=5 ttl=64 time=0.140 ms
--- 10.101.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4072ms
rtt min/avg/max/mdev = 0.138/0.143/0.156/0.016 ms

 

2017-12-19 12:35 GMT+05:00 Phil Schwarz <infolist@xxxxxxxxxxxxxx>:
Hi,
I'm currently trying to set up a brand new home cluster :
- 5 nodes, with each :

- 1 HCA Mellanox ConnectX-2
- 1 GB Ethernet (Proxmox 5.1 Network Admin)
- 1 CX4 to CX4 cable

All together connected to a SDR Flextronics IB Switch.

This setup should back a Ceph Luminous (V12.2.2 included in proxmox
V5.1) On all nodes, I did:
- apt-get infiniband-diags
- modprobe mlx4_ib
- modprobe ib_ipoib
- modprobe ib_umad
- ifconfig ib0 IP/MASK

On two nodes (tried previously on a single on, same issue), i installed
opensm ( The switch doesn't have SM included) :
apt-get install opensm
/etc/init.d/opensm stop
/etc/init.d/opensm start
(Necessary to let the daemon create the logfiles)

I tailed the logfile and got a "Active&Running" Setup, with "SUBNET UP"

Every node is OK regardless to IB Setup :
- All ib0 are UP, using ibstat
- ibhosts and ibswitches seem to be OK

On a node :
ibping -S

On every other node :
ibping -G GID_Of_Previous_Server_Port

I got a nice pong reply on every node. Should be happy, but...
But i never went further.. Tried to ping each other. No way to get into
this (mostly probably) simple issue...


Any hint to achieve this task ??


Thanks for all
Best regards

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192

ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux