Re: rdma-core: Bringing up IPoIB devices on boot fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Dienstag, den 15.05.2018, 08:58 -0600 schrieb Jason Gunthorpe:
> On Tue, May 15, 2018 at 04:47:22PM +0200, Benjamin Drung wrote:
> > Hi,
> > 
> > I have a Debian 9 (stretch) system with a backported rdma-core
> > 17.0-1
> > package. The system has a mlx4 card (mlx4_ib and mlx4_core kernel
> > modules) and following network configuration in
> > /etc/network/interfaces:
> > 
> > ```
> > auto ib0.dddd
> > iface ib0.dddd inet6 static
> >     address fd44:1:5255::
> >     netmask 64
> >     pre-up echo connected > /sys/class/net/$IFACE/mode
> >     dad-attempts 600
> > 
> > auto ib1.dddd
> > iface ib1.dddd inet6 static
> >     address fd44:2:5255::
> >     netmask 64
> >     pre-up echo connected > /sys/class/net/$IFACE/mode
> >     dad-attempts 600
> > ```
> > 
> > The terminal shows following ordering:
> > 
> > ```
> > [FAILED] Failed to start Raise network interfaces.
> > [  OK  ] Started Load RDMA modules from /etc/rdma/modules/rdma.conf
> > [  OK  ] Started Load RDMA modules from
> > /etc/rdma/modules/infiniband.conf
> > [  OK  ] Reached target RDMA Hardware.
> > ```
> > 
> > the networking.service fails with:
> > ```
> > $ journalctl --no-host -u networking.service
> > [...]
> > Mai 15 13:16:40 ifup[1645]: /bin/sh: 1: cannot create
> > /sys/class/net/ib0.dddd/mode: Directory nonexistent
> > Mai 15 13:16:40 ifup[1645]: ifup: failed to bring up ib0.dddd
> > Mai 15 13:16:40 ifup[1645]: /bin/sh: 1: cannot create
> > /sys/class/net/ib1.dddd/mode: Directory nonexistent
> > Mai 15 13:16:40 ifup[1645]: ifup: failed to bring up ib1.dddd
> > Mai 15 13:16:40 systemd[1]: networking.service: Main process
> > exited, code=exited, status=1/FAILURE
> > Mai 15 13:16:40 systemd[1]: Failed to start Raise network
> > interfaces.
> > Mai 15 13:16:40 systemd[1]: networking.service: Unit entered failed
> > state.
> > Mai 15 13:16:40 systemd[1]: networking.service: Failed with result
> > 'exit-code'.
> > 
> > ```
> > 
> > The networking.service fails because it tries to bring up
> > ib0.dddd/ib1.dddd before the rdma-load-modules@infiniband.service
> > loads
> > the ib_ipoib kernel module. networking.service declares that it
> > should
> > run after the network-pre.target and rdma-load-modules@infiniband.s
> > ervi
> > ce declares to run before network-pre.target. Therefore the order
> > should be rdma-load-modules@infiniband.service -> network-
> > pre.target ->
> > networking.service, but this is obviously not the case.
> > 
> > I am writing to this mailing list, because got stuck with debugging
> > this issue and need your help.
> 
> The udev.md explains this:
> 
>  ## Interaction with legacy non-hotplug services
> 
>  Services that cannot handle hot plug must be ordered after
>  systemd-udev-settle.service, which will wait for udev to complete
> loading
>  modules and scheduling systemd services. This ensures that all RDMA
> hardware
>  present at boot is setup before proceeding to run the legacy
> service.
> 
>  Admins using legacy services can also place their RDMA hardware
> modules
>  (e.g.  mlx4_ib) directly in /etc/modules-load.d/ or in their initrd
> which will
>  cause systemd to defer passing to sysinit.target until all RDMA
> hardware is
>  setup, this is usually sufficient for legacy services. This is
> probably the
>  default behavior in many configurations.
> 
> Since you see the backwards ordering and the errors it meands that
> ifupdown in stretch does not support hotplug. IMHO it is a bug in
> that
> package that it doesn't order after settle to try and avoid boot time
> hot plug events that it cannot handle.
> 
> The modules solution is simplest, add ipoib and HCA drivers to
> modules.conf

I added the systemd-udev-settle.service dependency:

```
$ systemctl cat networking.service 
# /lib/systemd/system/networking.service
[Unit]
Description=Raise network interfaces
Documentation=man:interfaces(5)
DefaultDependencies=no
Wants=network.target
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service
Before=network.target shutdown.target network-online.target
Conflicts=shutdown.target

[Install]
WantedBy=multi-user.target
WantedBy=network-online.target

[Service]
Type=oneshot
EnvironmentFile=-/etc/default/networking
ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
ExecStart=/sbin/ifup -a --read-environment
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo
RemainAfterExit=true
TimeoutStartSec=5min

# /etc/systemd/system/networking.service.d/rdma.conf
[Unit]
# See https://marc.info/?l=linux-rdma&m=152639629213650&w=2
After=systemd-udev-settle.service
```

but it is still not working (same error messages).

-- 
Benjamin Drung
System Developer
Debian & Ubuntu Developer

ProfitBricks GmbH
Greifswalder Str. 207
10405 Berlin

Email: benjamin.drung@xxxxxxxxxxxxxxxx
URL: https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux