Re: Upgraded multiple systems to systemd 249.3 and all had eth1 not started / configured

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Thank you for your reply.

I can understand that there can be race.

But when I check logs, there is no race happening.

Let us see and analyze the logs.

Stage 1:
System boots, and kernel assigns eth0, eth1 and eth2 as interface names.

Aug 18 09:17:13 kk kernel: e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) e0:d5:5e:8d:7f:2f
Aug 18 09:17:13 kk kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
Aug 18 09:17:13 kk kernel: e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
Aug 18 09:17:13 kk kernel: 8139too 0000:04:00.0 eth1: RealTek RTL8139 at 0x000000000e8fc9bb, 00:e0:4d:05:ee:a2, IRQ 19
Aug 18 09:17:13 kk kernel: r8169 0000:02:00.0 eth2: RTL8168e/8111e, 50:3e:aa:05:2b:ca, XID 2c2, IRQ 129
Aug 18 09:17:13 kk kernel: r8169 0000:02:00.0 eth2: jumbo features [frames: 9194 bytes, tx checksumming: ko]

Stage 2:
Now udev rules are triggered and the interfaces are renamed to tmpeth0, tmpeth2 and tmpeth1.

Aug 18 09:17:13 kk kernel: 8139too 0000:04:00.0 tmpeth2: renamed from eth1
Aug 18 09:17:13 kk kernel: e1000e 0000:00:1f.6 tmpeth0: renamed from eth0
Aug 18 09:17:13 kk kernel: r8169 0000:02:00.0 tmpeth1: renamed from eth2

Stage 3:
Now my script is called and it renames interfaces to eth0, eth2 and eth1.

Aug 18 09:17:13 kk kernel: e1000e 0000:00:1f.6 eth0: renamed from tmpeth0
Aug 18 09:17:14 kk kernel: r8169 0000:02:00.0 eth1: renamed from tmpeth1
Aug 18 09:17:14 kk kernel: 8139too 0000:04:00.0 eth2: renamed from tmpeth2

Effectively original interface eth1 and eth2 are swapped. While eth0 remains eth0.

All these happened before systemd-networkd started and interface renaming was over by 9:17:14.

Stage 4:
Now systemd-networkd starts, 2 seconds after all interface have been assigned their final names.

Aug 18 09:17:16 kk systemd[1]: Starting Network Configuration...
Aug 18 09:17:17 kk systemd-networkd[426]: lo: Link UP
Aug 18 09:17:17 kk systemd-networkd[426]: lo: Gained carrier
Aug 18 09:17:17 kk systemd-networkd[426]: Enumeration completed
Aug 18 09:17:17 kk systemd[1]: Started Network Configuration.
Aug 18 09:17:17 kk systemd-networkd[426]: eth2: Interface name change detected, renamed to eth1.
Aug 18 09:17:17 kk systemd-networkd[426]: Could not process link message: File exists
Aug 18 09:17:17 kk systemd-networkd[426]: eth1: Failed
Aug 18 09:17:17 kk systemd-networkd[426]: eth1: Interface name change detected, renamed to eth2.
Aug 18 09:17:17 kk systemd-networkd[426]: eth1: Interface name change detected, renamed to tmpeth2.
Aug 18 09:17:17 kk systemd-networkd[426]: eth0: Interface name change detected, renamed to tmpeth0.
Aug 18 09:17:17 kk systemd-networkd[426]: eth2: Interface name change detected, renamed to tmpeth1.
Aug 18 09:17:17 kk systemd-networkd[426]: tmpeth0: Interface name change detected, renamed to eth0.
Aug 18 09:17:17 kk systemd-networkd[426]: tmpeth1: Interface name change detected, renamed to eth1.
Aug 18 09:17:17 kk systemd-networkd[426]: tmpeth2: Interface name change detected, renamed to eth2.
Aug 18 09:17:17 kk systemd-networkd[426]: eth1: Link UP
Aug 18 09:17:17 kk systemd-networkd[426]: eth0: Link UP
Aug 18 09:17:20 kk systemd-networkd[426]: eth0: Gained carrier

This is when eth0 and eth1 interfaces are up and configured by systemd-networkd but eth2 is down and not configured.

None of the .network configuration files match by interface names. They all match just by MAC address.

# sample .network file.

[Match]
MACAddress=e0:d5:5e:8d:7f:2f
Type=ether

[Network]
IgnoreCarrierLoss=yes
LinkLocalAddressing=no
IPv6AcceptRA=no
ConfigureWithoutCarrier=true
Address=192.168.25.2/24

Above error message "eth1: failed", was not showing earlier version of systemd.

So recent version of systemd-networkd is doing something different and this is where something is going wrong.

Stage 5: (my workaround for this issue)
I wrote a new service file which restarts systemd-networkd after waiting for 10 seconds.

Aug 18 09:17:27 kk systemd[1]: Stopping Network Configuration...
Aug 18 09:17:27 kk systemd[1]: systemd-networkd.service: Deactivated successfully.
Aug 18 09:17:27 kk systemd[1]: Stopped Network Configuration.
Aug 18 09:17:27 kk systemd[1]: Starting Network Configuration...
Aug 18 09:17:27 kk systemd-networkd[579]: eth1: Link UP
Aug 18 09:17:27 kk systemd-networkd[579]: eth0: Link UP
Aug 18 09:17:27 kk systemd-networkd[579]: eth0: Gained carrier
Aug 18 09:17:27 kk systemd-networkd[579]: lo: Link UP
Aug 18 09:17:27 kk systemd-networkd[579]: lo: Gained carrier
Aug 18 09:17:27 kk systemd-networkd[579]: Enumeration completed
Aug 18 09:17:27 kk systemd[1]: Started Network Configuration.
Aug 18 09:17:27 kk systemd-networkd[579]: eth2: Link UP
Aug 18 09:17:27 kk systemd-networkd[579]: eth2: Gained carrier

All interfaces are now up and running as expected.

Please check as I do not believe that this issue is causing any race but to me it looks like some logical change in systemd-networkd which is causing the issue.

Thank you and regards,

Amish


On 17/08/21 3:18 pm, Colin Guthrie wrote:
Hiya,

As has been said, this is racy. "Sufficiently early" is just a hope, rather than a guarantee. Perhaps something in the kernel made things more or less efficient (try booting with the old kernel to see if it helps, but as this is a race, it may only work some of the time.). Or perhaps some unit ordering changed so make this better? Perhaps udev settle units have now been dropped and thus the boot is faster and things happen in a more hotplug oriented way? Lot's of possibilities for why this no longer works (and even before it definitely wasn't a guaranteed or recommended approach).

As has been said, you're best to pick a different "namespace" lan0 wan0 wan1 etc. if you can but if you can't change this due to some legacy scripts, at least pick sufficiently high ethN numbers to stay out of the way of the kernel, e.g. if you have three eth cards, then pick your names starting from e.g. 5: eth5, eth6, eth7 and thus you can avoid this dance with temporary names (although I'd still recommend using different names altogether if you can).

Hope that helps.

Col

Amish wrote on 16/08/2021 13:38:

On 16/08/21 5:39 pm, Lennart Poettering wrote:
On Mo, 16.08.21 17:31, Amish (anon.amish@xxxxxxxxx) wrote:

On 16/08/21 5:25 pm, Lennart Poettering wrote:
On Mo, 16.08.21 16:09, Amish (anon.amish@xxxxxxxxx) wrote:

Some old scripts that we have expect interface names starting with eth. But
those names are not predictable.

So to get predictable names starting with eth*, first I temporarily rename
all interface with tmpeth*. This is done via udev rules.

SUBSYSTEM=="net", ACTION="" ATTR{address}=="XX:XX:XX:XX:XX:XX",
NAME="tmpeth0"
SUBSYSTEM=="net", ACTION="" ATTR{address}=="XX:XX:XX:XX:XX:YY",
NAME="tmpeth1"
SUBSYSTEM=="net", ACTION="" ATTR{address}=="XX:XX:XX:XX:XX:ZZ",
NAME="tmpeth2"

Then I have a small service (script) which runs before network-pre.target to
convert these names back to eth*

#search for network interface with name starting from "tmpeth" and rename
them to "eth"
/usr/bin/find /sys/class/net -maxdepth 1 -name "tmpeth[0-9]" -type l -printf
"%f\n" | while read tmpiface; do /usr/bin/ip link set dev "$tmpiface" name
"$(echo $tmpiface | sed s/tmpeth/eth/)"; done

This ensures that I have predictable names starting with eth*. And it is
working fine from 2-3 years. Even with current issue, name assignment is
working fine.
This cannot work and is necesarily race. Stay out of the ethXYZ
namespace, that's the kernel's namespace. Pick any other names,
i.e. "foobar0", "foobar1", but otherwise you just have a racy racy
mess, because the kernel might take the name whenever it pleases.
No I dont think this is race. Because my script runs after Udev has finished
assigning the interfaces names.
device probing can take any time it wants. there isn't a point in time
where everything is probed.

These are internal PCI LAN cards. I believe these gets probed (and named) sufficiently early.

And then we can expect names assigned by Udev to remain same.

And I can see in the logs that names are not changed after my script runs.

Also this has been working successfully for me from 2 or more years.

But after today's update, something is breaking all the systems.

Additionally just now on other system I see eth2 (instead of eth1) being renamed to eth0.

I just want to know what changed and where? (Kernel or Systemd?).

*Also another point is, I have set ConfigureWithoutCarrier=yes in network files and all are static IPs, so systemd-networkd should have configured the devices even if links are not up. But its not doing that anymore either after today's update.*

Regards

Amish.

Lennart

--
Lennart Poettering, Berlin



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux