Re: rdma-core: Bringing up IPoIB devices on boot fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 15, 2018 at 04:47:22PM +0200, Benjamin Drung wrote:
> Hi,
> 
> I have a Debian 9 (stretch) system with a backported rdma-core 17.0-1
> package. The system has a mlx4 card (mlx4_ib and mlx4_core kernel
> modules) and following network configuration in
> /etc/network/interfaces:
> 
> ```
> auto ib0.dddd
> iface ib0.dddd inet6 static
>     address fd44:1:5255::
>     netmask 64
>     pre-up echo connected > /sys/class/net/$IFACE/mode
>     dad-attempts 600
> 
> auto ib1.dddd
> iface ib1.dddd inet6 static
>     address fd44:2:5255::
>     netmask 64
>     pre-up echo connected > /sys/class/net/$IFACE/mode
>     dad-attempts 600
> ```
> 
> The terminal shows following ordering:
> 
> ```
> [FAILED] Failed to start Raise network interfaces.
> [  OK  ] Started Load RDMA modules from /etc/rdma/modules/rdma.conf
> [  OK  ] Started Load RDMA modules from /etc/rdma/modules/infiniband.conf
> [  OK  ] Reached target RDMA Hardware.
> ```
> 
> the networking.service fails with:
> ```
> $ journalctl --no-host -u networking.service
> [...]
> Mai 15 13:16:40 ifup[1645]: /bin/sh: 1: cannot create /sys/class/net/ib0.dddd/mode: Directory nonexistent
> Mai 15 13:16:40 ifup[1645]: ifup: failed to bring up ib0.dddd
> Mai 15 13:16:40 ifup[1645]: /bin/sh: 1: cannot create /sys/class/net/ib1.dddd/mode: Directory nonexistent
> Mai 15 13:16:40 ifup[1645]: ifup: failed to bring up ib1.dddd
> Mai 15 13:16:40 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
> Mai 15 13:16:40 systemd[1]: Failed to start Raise network interfaces.
> Mai 15 13:16:40 systemd[1]: networking.service: Unit entered failed state.
> Mai 15 13:16:40 systemd[1]: networking.service: Failed with result 'exit-code'.
> 
> ```
> 
> The networking.service fails because it tries to bring up
> ib0.dddd/ib1.dddd before the rdma-load-modules@infiniband.service loads
> the ib_ipoib kernel module. networking.service declares that it should
> run after the network-pre.target and rdma-load-modules@infiniband.servi
> ce declares to run before network-pre.target. Therefore the order
> should be rdma-load-modules@infiniband.service -> network-pre.target ->
> networking.service, but this is obviously not the case.
> 
> I am writing to this mailing list, because got stuck with debugging
> this issue and need your help.

The udev.md explains this:

 ## Interaction with legacy non-hotplug services

 Services that cannot handle hot plug must be ordered after
 systemd-udev-settle.service, which will wait for udev to complete loading
 modules and scheduling systemd services. This ensures that all RDMA hardware
 present at boot is setup before proceeding to run the legacy service.

 Admins using legacy services can also place their RDMA hardware modules
 (e.g.  mlx4_ib) directly in /etc/modules-load.d/ or in their initrd which will
 cause systemd to defer passing to sysinit.target until all RDMA hardware is
 setup, this is usually sufficient for legacy services. This is probably the
 default behavior in many configurations.

Since you see the backwards ordering and the errors it meands that
ifupdown in stretch does not support hotplug. IMHO it is a bug in that
package that it doesn't order after settle to try and avoid boot time
hot plug events that it cannot handle.

The modules solution is simplest, add ipoib and HCA drivers to
modules.conf

The robust and future looking solution is to use systemd-networkd
instead of legacy ifupdown...

It is a bit annoying today to get the connected setting though.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux