Hey James,
Thanks! Responses below
On Wed, Dec 1, 2021 at 1:12 PM James Feeney <james@xxxxxxxxxxx> wrote:
On 12/1/21 07:20, Brian Hutchinson wrote:
> ...
> In .system file I tried all I know to ensure the required interfaces were created before starting ptp4l in attempt to give bonding enough time to finish but binding to things like sys-subsystem-net-devices-bond1.device wasn't enough.
>
> Is it also possible to use carrier state in .service file?
>
> I see sys/devices/virtual/net/bond1/carrier but not sure how to only attempt to start my ptp4l service after carrier state is "1".
>
> I welcome your ideas and suggestions on how to start a service after a bond interface is really up.
With systemd, the proper way to setup network bonding is to establish ordering with the use of "target" files, which can be added to /etc/systemd/system.
The target files themselves need not contain anything, though I have these with simply:
[Unit]
Documentation= man:systemd.target(5)
My configuration provides automatic bonding and bridging for removable/pluggable and fixed hardwired, wireless, and virtual interfaces, using hardlinked template files and a separate network configuration file, as /etc/conf.d/network, though you are only looking for bonding here. The big advantage with using systemd as the network configuration system, compared to alternatives, is that it "just works", and doesn't break after someone else's "upgrade".
Your hardware situation is certainly more interesting than mine with hotplug stuff ... in the old days I had to do udev rules for stuff like that but with this project I decided to finally go with systemd. For the most part it does "just work" until it "doesn't" and I've ran into that quite a few times now and this is one of those cases (note I'm on linux-fslc-imx 5.10.69 and I understand some bonding issues have been fixed in 5.11 but I don't think that fix pertains to what I'm seeing here). There are hooks to guarantee the "network is online" before going on ... and they don't work right in this case. You can see from my serial console log bond1 isn't up until after the login prompt and all the systemd targets have finished! And systemd was told not to start ptp4l until after the network is up and you can clearly see it being started before bond1 is up.
The essential idea with configuring virtual network interfaces using systemd target files derives from noting that network service clients and servers must run After bridge and bond master interfaces are working, which implies After configuration of their respective slave interfaces, and that hardware devices can only be enslaved After the master interfaces have been created. These constraints imply the following ordering:
1) master interfaces
2) enslaved interfaces
3) network services
The systemd target files are then inferred between these three stages:
a) master interfaces
b) "go.target"
c) enslaved interfaces
d) "ll.target"
e) network services
The target file naming is arbitrary, of course. I use these names from arbitrarily choosing the point of view from the template file used to configure each slave device to each master, where finally "ip link set %P master %I".
You could use the terminology "director" and "executive", from corporate structure lingo, instead of "master" and "slave", if preferred, but the ip command still uses the the terms "master" and "slave".
A hardware network device Requires go.target and the master interface service file "master@.service" runs Before go.target:
Requires= go.target
Before= go.target
Plugging network hardware, then, will trigger the entire chain of configuration events.
BindsTo= sys-subsystem-net-devices-%i.device
Similarly, for the enslaved interface service file "enslaved@.service":
Requires= go.target
After= go.target
Before= ll.target
And finally, for the various network services service template files:
PartOf= ll.target
Requires= ll.target
After= ll.target
That's the basic idea. Of course, there are plenty of "housekeeping" details in practice. In particular, "Requisite" fails to recognize device units, and instead,
ConditionPathExists= /sys/class/net/%I
is necessary. This appears to me to be an unjustified bug with "Requisite", but - you know - Lennart.
Altogether, to trigger configuration of both master and slave devices from "enslaved@.service":
BindsTo= sys-subsystem-net-devices-%p.device
ConditionPathExists= /sys/class/net/%P
BindsTo= sys-subsystem-net-devices-%i.device
It is useful to impose an arbitrary but strict naming convention with these files, to allow use of systemd specifiers and template files. In your case, you might simply hard-code what you want, if you are not looking for a generic solution, and all you want is bonding on a couple of interfaces.
Still, when properly setup, you can individually "start" and "stop" any of the target units or network service units and get correct behavior.
Maybe I'm missing something here but I don't see any way for me to "add targets" to this problem to solve it unless I abandon the systemd way of setting up the bond and wrap my "command line" way of creating the bond (with echo and ip commands) with .target and .service files ... which is going back to init scripts basically.
I guess I could make a .service that calls an ExecCondition= script that could see if /sys/devices/virtual/net/bond1/carrier = 1
AND (/sys/bus/i2c/devices/0-005f/net/lan1/carrier = 1 OR /sys/bus/i2c/devices/0-005f/net/lan2/carrier = 1)
AND (/sys/bus/i2c/devices/0-005f/net/lan1/carrier = 1 OR /sys/bus/i2c/devices/0-005f/net/lan2/carrier = 1)
... and start my ptp4l service after that.
But even that would probably need to Restart=on-failure like I have now if those interfaces aren't up yet.
I guess I'm just having a bit of buyer's remorse for believing I could rely on network-online.target before going on ... and I can't. https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ "If you are a developer, instead of wondering what to do about
network.target
,
please just fix your program to be friendly to dynamically changing
network configuration. That way you will make your users happy because
things just start to work, and you will get fewer bug reports as your
stuff is just rock solid. You also make the boot faster for your users,
as they don't have to delay arbitrary services for the network anymore" ... I guess the systemd bonding implementor didn't abide by all that. ;)I've read more and I've usurped the default action of systemd-networkd-wait-online.service to be more specific on which interfaces to wait on and what states they need to be in before moving on:
# SPDX-License-Identifier: LGPL-2.1+
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Wait for Network to be Configured
Documentation=man:systemd-networkd-wait-online.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
Requires=systemd-networkd.service
After=systemd-networkd.service
Before=network-online.target shutdown.target
[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-networkd-wait-online --interface bond1:degraded-carrier:carrier --interface lan1:carrier
RemainAfterExit=yes
[Install]
WantedBy=network-online.target
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Wait for Network to be Configured
Documentation=man:systemd-networkd-wait-online.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
Requires=systemd-networkd.service
After=systemd-networkd.service
Before=network-online.target shutdown.target
[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-networkd-wait-online --interface bond1:degraded-carrier:carrier --interface lan1:carrier
RemainAfterExit=yes
[Install]
WantedBy=network-online.target
... and it still doesn't work ... you can clearly see "Sync Microchip PHC with PTP Grand Master Clock" (my ptp4l.service) being called before the bond1 is online ... which doesn't happen until after the login prompt:
[ OK ] Reached target Network.
[ 4.096782] imx-sdma 302c0000.dma-controller: firmware found.
[ OK ] Reached targe[ 4.104764] imx-sdma 302c0000.dma-controller: loaded firmware 4.5
t Network is Online.[ 4.109828] caam-snvs 30370000.caam-snvs: violation handlers armed - init state
[ OK ] Reached target Host and Network Name Lookups.
Starting Avahi mDNS/DNS-SD Stack...
Starting Enable ksz9567...
Starting The NGINX HTTP and reverse proxy server...
Starting Sync M[ 4.189072] imx-sdma 302b0000.dma-controller: firmware found.
icrochip PH…with PTP Grand Master Clock...
[ OK ] Started Enable ksz9567.
[FAILED] Failed to start Sync Micro…C with PTP Grand Master Clock.
See 'systemctl status ptp4l.service' for details.
[ OK ] Started The NGINX HTTP and reverse proxy server.
[ 4.254479] imx-sdma 30bd0000.dma-controller: firmware found.
[ OK ] Started Avahi mDNS/DNS-SD Stack.
[ 4.413378] ksz9477-switch 0-005f lan1: configuring for phy/gmii link mode
[ 4.427011] bond1: (slave lan1): Enslaving as a backup interface with a down link
[ 4.501283] ksz9477-switch 0-005f lan2: configuring for phy/gmii link mode
[ 4.511903] bond1: (slave lan2): Enslaving as a backup interface with a down link
Starting Save/Restore Sound Card State...
[ OK ] Started Save/Restore Sound Card State.
[ OK ] Reached target Sound Card.
[ 5.009993] random: crng init done
[ 5.013414] random: 7 urandom warning(s) missed due to ratelimiting
[ OK ] Started Load/Save Random Seed.
[ OK ] Started System Logger Daemon "default" instance.
[ OK ] Reached target Multi-User System.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Started Update UTMP about System Runlevel Changes.
Poky (Yocto Project Reference Distro) 3.1.7 imx8mmevk ttymxc1
imx8mmevk login: [ 7.531146] ksz9477-switch 0-005f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 8.873069] bond1: (slave lan1): link status definitely up, 1000 Mbps full duplex
[ 8.882016] bond1: (slave lan1): making interface the new active one
[ 8.892488] device eth0 entered promiscuous mode
[ 8.897180] audit: type=1700 audit(1600598644.664:2): dev=eth0 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[ 8.913688] bond1: active interface up!
[ 8.917595] IPv6: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready
[ 4.096782] imx-sdma 302c0000.dma-controller: firmware found.
[ OK ] Reached targe[ 4.104764] imx-sdma 302c0000.dma-controller: loaded firmware 4.5
t Network is Online.[ 4.109828] caam-snvs 30370000.caam-snvs: violation handlers armed - init state
[ OK ] Reached target Host and Network Name Lookups.
Starting Avahi mDNS/DNS-SD Stack...
Starting Enable ksz9567...
Starting The NGINX HTTP and reverse proxy server...
Starting Sync M[ 4.189072] imx-sdma 302b0000.dma-controller: firmware found.
icrochip PH…with PTP Grand Master Clock...
[ OK ] Started Enable ksz9567.
[FAILED] Failed to start Sync Micro…C with PTP Grand Master Clock.
See 'systemctl status ptp4l.service' for details.
[ OK ] Started The NGINX HTTP and reverse proxy server.
[ 4.254479] imx-sdma 30bd0000.dma-controller: firmware found.
[ OK ] Started Avahi mDNS/DNS-SD Stack.
[ 4.413378] ksz9477-switch 0-005f lan1: configuring for phy/gmii link mode
[ 4.427011] bond1: (slave lan1): Enslaving as a backup interface with a down link
[ 4.501283] ksz9477-switch 0-005f lan2: configuring for phy/gmii link mode
[ 4.511903] bond1: (slave lan2): Enslaving as a backup interface with a down link
Starting Save/Restore Sound Card State...
[ OK ] Started Save/Restore Sound Card State.
[ OK ] Reached target Sound Card.
[ 5.009993] random: crng init done
[ 5.013414] random: 7 urandom warning(s) missed due to ratelimiting
[ OK ] Started Load/Save Random Seed.
[ OK ] Started System Logger Daemon "default" instance.
[ OK ] Reached target Multi-User System.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Started Update UTMP about System Runlevel Changes.
Poky (Yocto Project Reference Distro) 3.1.7 imx8mmevk ttymxc1
imx8mmevk login: [ 7.531146] ksz9477-switch 0-005f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 8.873069] bond1: (slave lan1): link status definitely up, 1000 Mbps full duplex
[ 8.882016] bond1: (slave lan1): making interface the new active one
[ 8.892488] device eth0 entered promiscuous mode
[ 8.897180] audit: type=1700 audit(1600598644.664:2): dev=eth0 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[ 8.913688] bond1: active interface up!
[ 8.917595] IPv6: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready
systemctl status ptp4l
[[0;1;31m*[[0m ptp4l.service - Sync Microchip PHC with PTP Grand Master Clock
Loaded: loaded (/etc/systemd/system/ptp4l.service; enabled; vendor preset: disabled)
Active: [[0;1;31mfailed[[0m (Result: exit-code) since Sun 2020-09-20 10:44:01 UTC; 40s ago
Process: 332 ExecStart=/usr/bin/ptp4l -f /etc/linuxptp/ptp4l.conf_e2e_one_step_g8275.2 -s -i bond1 [[0;1;31m(code=exited, status=255/EXCEPTION)[[0m
Main PID: 332 (code=exited, status=255/EXCEPTION)
Sep 20 10:44:01 imx8mmevk systemd[1]: Starting Sync Microchip PHC with PTP Grand Master Clock...
Sep 20 10:44:01 imx8mmevk ptp4l[332]: [[0;1;31m[[0;1;39m[[0;1;31m[5.601] interface 'bond1' does not support requested timestamping mode[[0m
Sep 20 10:44:01 imx8mmevk ptp4l[332]: failed to create a clock
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39mptp4l.service: Main process exited, code=exited, status=255/EXCEPTION[[0m
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185mptp4l.service: Failed with result 'exit-code'.[[0m
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;31m[[0;1;39m[[0;1;31mFailed to start Sync Microchip PHC with PTP Grand Master Clock.[[0m
Loaded: loaded (/etc/systemd/system/ptp4l.service; enabled; vendor preset: disabled)
Active: [[0;1;31mfailed[[0m (Result: exit-code) since Sun 2020-09-20 10:44:01 UTC; 40s ago
Process: 332 ExecStart=/usr/bin/ptp4l -f /etc/linuxptp/ptp4l.conf_e2e_one_step_g8275.2 -s -i bond1 [[0;1;31m(code=exited, status=255/EXCEPTION)[[0m
Main PID: 332 (code=exited, status=255/EXCEPTION)
Sep 20 10:44:01 imx8mmevk systemd[1]: Starting Sync Microchip PHC with PTP Grand Master Clock...
Sep 20 10:44:01 imx8mmevk ptp4l[332]: [[0;1;31m[[0;1;39m[[0;1;31m[5.601] interface 'bond1' does not support requested timestamping mode[[0m
Sep 20 10:44:01 imx8mmevk ptp4l[332]: failed to create a clock
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39mptp4l.service: Main process exited, code=exited, status=255/EXCEPTION[[0m
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185mptp4l.service: Failed with result 'exit-code'.[[0m
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;31m[[0;1;39m[[0;1;31mFailed to start Sync Microchip PHC with PTP Grand Master Clock.[[0m
cat ptp4l.service
[Unit]
Description=Sync Microchip PHC with PTP Grand Master Clock
#Requires=network-online.target multi-user.target
#BindsTo=sys-subsystem-net-devices-bond1.device sys-subsystem-net-devices-lan1.device sys-subsystem-net-devices-lan2.device multi-user.target
#After=sys-subsystem-net-devices-bond1.device sys-subsystem-net-devices-lan1.device sys-subsystem-net-devices-lan2.device multi-user.target
After=network-online.target
Wants=network-online.target
[Service]
Type=exec
#NotifyAccess=all
ExecStart=/usr/bin/ptp4l -f /etc/linuxptp/ptp4l.conf_e2e_one_step_g8275.2 -s -i bond1
#Restart=on-failure
#RestartSec=1
[Install]
WantedBy=multi-user.target
[Unit]
Description=Sync Microchip PHC with PTP Grand Master Clock
#Requires=network-online.target multi-user.target
#BindsTo=sys-subsystem-net-devices-bond1.device sys-subsystem-net-devices-lan1.device sys-subsystem-net-devices-lan2.device multi-user.target
#After=sys-subsystem-net-devices-bond1.device sys-subsystem-net-devices-lan1.device sys-subsystem-net-devices-lan2.device multi-user.target
After=network-online.target
Wants=network-online.target
[Service]
Type=exec
#NotifyAccess=all
ExecStart=/usr/bin/ptp4l -f /etc/linuxptp/ptp4l.conf_e2e_one_step_g8275.2 -s -i bond1
#Restart=on-failure
#RestartSec=1
[Install]
WantedBy=multi-user.target
...but after logging in and running systemctl restart ptp4l everything works. This is a straight up race condition during startup ... and I don't know how to fix it the "systemd" way. Am I doing something wrong or is something in systemd bonding broken???
Regards,
Brian