Race condition between / wrong load order of ib_umad and ib_ipoib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

after a kernel upgrade to version 4.19 (in-house built with Mellanox
OFED drivers), some of our systems fail to bring up their IPoIB devices
on boot. Different HCAs are affected (e.g. MT4099 and MT26428). We are
using rdma-core on Debian and have IPoIB devices (like `ib0.dddd`)
configured in `/etc/network/interfaces`. Big cluster seem to be more
affected than smaller ones. In case of the failure, we see this kernel
message:

```
ib0.dddd: P_Key 0xdddd is not found
```

Pinging other hosts will fail then with:

```
ping: sendmsg: Network is unreachable
```

Upgrading to rdma-core 29.0 did not change anything. Excluding all
InfiniBand kernel modules from the initrd reduced the likelihood to run
into this issue, but did not fix it.

We found one report on the Internet describing a similar issue, which
claims that the solution is to change/fix them module load order: 
https://community.brightcomputing.com/question/5d6614ba08e8e81e885f18ef

We use the default `/etc/rdma/modules/infiniband.conf` shipped in the
Debian package:

```
# These modules are loaded by the system if any InfiniBand device is installed
# InfiniBand over IP netdevice
ib_ipoib

# Access to fabric management SMPs and GMPs from userspace.
ib_umad

# SCSI Remote Protocol target support
# ib_srpt

# ib_ucm provides the obsolete /dev/infiniband/ucm0
# ib_ucm
```

Due to this configuration, `ib_ipoib` is loaded before `ib_umad`. After
changing the order in this configuration file to load `ib_umad` before
`ib_ipoib`, the servers come up correctly.

-- 
Benjamin Drung

DevOps Engineer and Debian & Ubuntu Developer
Platform Integration (IONOS Cloud)

1&1 IONOS SE | Greifswalder Str. 207 | 10405 Berlin | Germany
E-mail: benjamin.drung@xxxxxxxxxxxxxxx | Web: www.ionos.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 24498

Vorstand: Dr. Christian Böing, Hüseyin Dogan, Dr. Martin Endreß, Hans-
Henning Kettler, Arthur Mai, Matthias Steinberg, Achim Weiß
Aufsichtsratsvorsitzender: Markus Kadelke


Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie
bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem
bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
verwenden.

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this e-mail 
in any way is prohibited. If you have received this e-mail in error,
please notify the sender and delete the e-mail.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux