Re: rdma-core: Bringing up IPoIB devices on boot fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 18, 2018 at 11:26:07AM +0200, Benjamin Drung wrote:
> Am Donnerstag, den 17.05.2018, 11:16 -0600 schrieb Jason Gunthorpe:
> > On Thu, May 17, 2018 at 07:02:47PM +0200, Benjamin Drung wrote:
> > > Am Dienstag, den 15.05.2018, 13:20 -0600 schrieb Jason Gunthorpe:
> > > > On Tue, May 15, 2018 at 02:15:54PM -0400, Doug Ledford wrote:
> > > > > > I added the systemd-udev-settle.service dependency:
> > > > > > 
> > > > > > ```
> > > > > > $ systemctl cat networking.service 
> > > > > > # /lib/systemd/system/networking.service
> > > > > > [Unit]
> > > > > > Description=Raise network interfaces
> > > > > > Documentation=man:interfaces(5)
> > > > > > DefaultDependencies=no
> > > > > > Wants=network.target
> > > > > > After=local-fs.target network-pre.target apparmor.service
> > > > > > systemd-sysctl.service systemd-modules-load.service
> > > > > > Before=network.target shutdown.target network-online.target
> > > > > > Conflicts=shutdown.target
> > > > > > 
> > > > > > [Install]
> > > > > > WantedBy=multi-user.target
> > > > > > WantedBy=network-online.target
> > > > > > 
> > > > > > [Service]
> > > > > > Type=oneshot
> > > > > > EnvironmentFile=-/etc/default/networking
> > > > > > ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ]
> > > > > > &&
> > > > > > [ -n "$(ifquery --read-environment --list --exclude=lo)" ] &&
> > > > > > udevadm settle'
> > > > > 
> > > > > I wouldn't trust that you can run udevadm settle here and get
> > > > > the
> > > > > right
> > > > > results.  This will only wait for the current udev hotplug
> > > > > events
> > > > > to
> > > > > complete.
> > > > 
> > > > Oh, neat, so udev settle is already called by Debian's
> > > > networking.service (as it should be) - assuming
> > > > CONFIGURE_INTERFACES
> > > > is set, and whatever that other stuff does (Ben is this
> > > > triggering
> > > > for you?)
> > > 
> > > I should have looked more closely at the service file (I didn't
> > > notice
> > > the udevadm settle in there). CONFIGURE_INTERFACES is not set in
> > > /etc/default/networking and ifquery returns a bunch of interfaces.
> > > Therefore 'udevadm settle' is executed.
> > > 
> > > I tried to debug it further by injecting commands to the pre-up
> > > hook.
> > > When pre-up runs:
> > > 
> > > * lsmod shows that ib_ipoib is loaded
> > > * 'ls -l /sys/class/net/' shows that neither ib0 and ib1 are
> > > present
> > > 
> > > To me it looks like a race condition between populating
> > > /sys/class/net/ibX after loading ib_ipoib and the networking
> > > service.
> > 
> > Is the rdma device present at this point? eg sys/class/infiniband ?
> 
> /sys/class/infiniband/mlx4_0 is present.
> 
> > Is any systemd-modules-load processes still running?
> 
> '/lib/systemd/systemd-modules-load /etc/rdma/modules/infiniband.conf'
> is still running.

> > Are the mlx IB modules loaded?
> 
> Yes: mlx4_ib, mlx4_core, and mlx_compat are loaded (according to
> lsmod). The first two modules are already loaded in the initrd. Also
> ib_ipoib, ib_uverbs, ib_sa, ib_mad, ib_core, ib_addr, ib_netlink are
> loaded.

Hmm, that is very mysterious, then, I can't think how systemd-modules-load
could still be running at this point.

If you load the ib driver in initrd then the above should have been
scheduled very early in boot, and it has a Before=network-pre.target
which should delay networking.service from starting while it is running.

What does the logging say about when rdma-load-modules was started and
was the IB device created before the initrd device exited?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux