On Fri, May 18, 2018 at 06:22:12PM +0200, Benjamin Drung wrote: > > Hmm, that is very mysterious, then, I can't think how systemd- > > modules-load > > could still be running at this point. > > > > If you load the ib driver in initrd then the above should have been > > scheduled very early in boot, and it has a Before=network-pre.target > > which should delay networking.service from starting while it is > > running. > > > > What does the logging say about when rdma-load-modules was started > > and > > was the IB device created before the initrd device exited? > > I opened a bug report against systemd in Debian: > https://bugs.debian.org/899002 > > Then I tried to implement a workaround (which does not work): > > $ cat /etc/systemd/system/networking.service.d/rdma.conf > [Service] > # Work around systemd bug https://bugs.debian.org/899002 > # See also https://marc.info/?l=linux-rdma&m=152639629213650&w=2 > ExecStartPre=/bin/ps auxff > ExecStartPre=/bin/ls -l /sys/class/infiniband > ExecStartPre=/bin/systemctl status rdma-load-modules@infiniband.service > ExecStartPre=/bin/sh -c 'while pid=$(pidof -s systemd-modules-load); do echo "Waiting for systemd-modules-load process $pid to exit..."; tail --pid=$pid -f /dev/null; done' > > systemctl status says that rdma-load-modules@infiniband.service was > started one second after networking.service. > > The ps command from ExecStartPre says that only systemd-journald, > systemd-udevd, multipathd, and init were running. "ls -l > /sys/class/infiniband" says that mlx4_0 is present. And "systemctl > status rdma-load-modules@infiniband.service" says: > > rdma-load-modules@infiniband.service - Load RDMA modules from /etc/rdma/modules/infiniband.conf > Loaded: loaded (/lib/systemd/system/rdma-load-modules@.service; static; vendor preset: enabled) > Active: inactive (dead) > Docs: file:/usr/share/doc/rdma-core/udev.md > > So it is clear, that rdma-load-modules@infiniband.service is not > triggered when networking.service is started. Hum, if you have the modules in the initrd then udev should schedule this service to run essentially immediately on boot, and it should become ordered properly.. Ie the rdma device should already present when udev is started. Starting *after* networking.service suggests that the mlx4 RDMA device was hotplugged into the system a long time after early boot! Which is not at all what I expect. What does dmesg say about the mlx4 driver load? Upstream blocks module completion until the driver is done (this takes a long time), is it possible that MOFED does this async? That could explain everything. Also, IMHO, the networking.service above is wrong. It should not attempt to do udevadm settle internally, but it must depend on systemd-udev-settle.service. The reason is due to how systemd scheduals ordering. Once it starts running networking.service 'ExecStartPre' it will not re-consider order past that point. So any activations done by udev while settling have no impact on networking.service at all. Having it depend on systemd-udev-settle.service means it gets to recheck ordering after settle is done, but before starting networking.sevice - which is the behavior it is really trying to get. That may be a big part of this bug, go back to doing: After=systemd-udev-settle.service Requires=systemd-udev-settle.service Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html