Re: rdma-core: Bringing up IPoIB devices on boot fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Freitag, den 18.05.2018, 11:31 -0600 schrieb Jason Gunthorpe:
> On Fri, May 18, 2018 at 06:22:12PM +0200, Benjamin Drung wrote:
> > > Hmm, that is very mysterious, then, I can't think how systemd-
> > > modules-load
> > > could still be running at this point.
> > > 
> > > If you load the ib driver in initrd then the above should have
> > > been
> > > scheduled very early in boot, and it has a Before=network-
> > > pre.target
> > > which should delay networking.service from starting while it is
> > > running.
> > > 
> > > What does the logging say about when rdma-load-modules was
> > > started
> > > and
> > > was the IB device created before the initrd device exited?
> > 
> > I opened a bug report against systemd in Debian:
> > https://bugs.debian.org/899002
> > 
> > Then I tried to implement a workaround (which does not work):
> > 
> > $ cat /etc/systemd/system/networking.service.d/rdma.conf
> > [Service]
> > # Work around systemd bug https://bugs.debian.org/899002
> > # See also https://marc.info/?l=linux-rdma&m=152639629213650&w=2
> > ExecStartPre=/bin/ps auxff
> > ExecStartPre=/bin/ls -l /sys/class/infiniband
> > ExecStartPre=/bin/systemctl status rdma-load-modules@xxxxxxxxxxxxxx
> > vice
> > ExecStartPre=/bin/sh -c 'while pid=$(pidof -s systemd-modules-
> > load); do echo "Waiting for systemd-modules-load process $pid to
> > exit..."; tail --pid=$pid -f /dev/null; done'
> > 
> > systemctl status says that rdma-load-modules@infiniband.service was
> > started one second after networking.service.
> > 
> > The ps command from ExecStartPre says that only systemd-journald,
> > systemd-udevd, multipathd, and init were running. "ls -l
> > /sys/class/infiniband" says that mlx4_0 is present. And "systemctl
> > status rdma-load-modules@infiniband.service" says:
> > 
> > rdma-load-modules@infiniband.service - Load RDMA modules from
> > /etc/rdma/modules/infiniband.conf
> >   Loaded: loaded (/lib/systemd/system/rdma-load-modules@.service;
> > static; vendor preset: enabled)
> >   Active: inactive (dead)
> >     Docs: file:/usr/share/doc/rdma-core/udev.md
> > 
> > So it is clear, that rdma-load-modules@infiniband.service is not
> > triggered when networking.service is started.
> 
> Hum, if you have the modules in the initrd then udev should schedule
> this service to run essentially immediately on boot, and it should
> become ordered properly..
> 
> Ie the rdma device should already present when udev is started.
> 
> Starting *after* networking.service suggests that the mlx4 RDMA
> device
> was hotplugged into the system a long time after early boot! Which is
> not at all what I expect.
> 
> What does dmesg say about the mlx4 driver load?

I booted with break=bottom and listed the loaded modules in the initrd.
They were:

mlx4_ib
ib_sa
ib_mad
ib_core
ib_addr
ib_netlink
mlx4_core
mlx_compat

> Upstream blocks module completion until the driver is done (this
> takes
> a long time), is it possible that MOFED does this async? That could
> explain everything.
> 
> Also, IMHO, the networking.service above is wrong. It should not
> attempt to do udevadm settle internally, but it must depend on
> systemd-udev-settle.service.
> 
> The reason is due to how systemd scheduals ordering. Once it starts
> running networking.service 'ExecStartPre' it will not re-consider
> order past that point. So any activations done by udev while settling
> have no impact on networking.service at all.
> 
> Having it depend on systemd-udev-settle.service means it gets to
> recheck ordering after settle is done, but before starting
> networking.sevice - which is the behavior it is really trying to get.
> 
> That may be a big part of this bug, go back to doing:
> 
> After=systemd-udev-settle.service
> Requires=systemd-udev-settle.service

You are right. I modified networking.service accordingly and it works
as expected now. I send a patch for ifupdown to Debian, but a
discussion about the fix started: https://bugs.debian.org/899002

-- 
Benjamin Drung
System Developer
Debian & Ubuntu Developer

ProfitBricks GmbH
Greifswalder Str. 207
10405 Berlin

Email: benjamin.drung@xxxxxxxxxxxxxxxx
URL: https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux