My bad.
I ended up trying it again and it worked as you described after a machinectl reboot. Maybe that was the issue. networkctl reload/reconfigure seemed to sometimes behave inconsistently.
Thank you very much for your help and explanations.
On Wed, 12 Jul 2023 at 11:46, Andrei Borzenkov <arvidjaar@xxxxxxxxx> wrote:
Please either use reply to all or reply to list. Do not send personal
replies to public list discussion.
On Wed, Jul 12, 2023 at 12:01 PM LunarLambda <lunarlambda@xxxxxxxxx> wrote:
>
> What about the GatewayOnLink= inside the container? Isn't it meant for exactly this? Why does ip r ... onlink work but doing it via networkd doesn't?
>
Not sure. I tested it on openSUSE Tumbleweed with systemd 253.5 and it
seems to work
tumbleweed:/run/systemd/network # cat dummy.netdev
[NetDev]
Kind=dummy
Name=dummy0
MACAddress=none
tumbleweed:/run/systemd/network # cat dummy0.network
[Match]
Name=dummy0
[Network]
Address=92.0.0.1/32
[Route]
Gateway=37.0.0.2
GatewayOnLink=true
tumbleweed:/run/systemd/network # ip a
5: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UNKNOWN group default qlen 1000
link/ether be:3a:49:74:6b:7a brd ff:ff:ff:ff:ff:ff
inet 92.0.0.1/32 scope global dummy0
valid_lft forever preferred_lft forever
inet6 fe80::bc3a:49ff:fe74:6b7a/64 scope link
valid_lft forever preferred_lft forever
tumbleweed:/run/systemd/network # ip r
default via 37.0.0.2 dev dummy0 proto static onlink
tumbleweed:/run/systemd/network #
> The old setup used
> ip r add 91.x.x.x dev br0 src 37.x.x.x
>
> Via commands issued in /etc/network/interfaces.
>
> The old route looks like this:
> 91.x.x.x dev br0 scope link src 37.x.x.x
>
> The route created by the network configuration looks like this:
> 91.x.x.x dev br0 proto static
>
> Although I'm not sure this represents a meaningful difference.
>
> On Wed, 12 Jul 2023 at 10:29, Andrei Borzenkov <arvidjaar@xxxxxxxxx> wrote:
>>
>> On Wed, Jul 12, 2023 at 10:44 AM LunarLambda <lunarlambda@xxxxxxxxx> wrote:
>> >
>> > Hello,
>> >
>> > I was recently tasked with moving existing network configuration for a machine and some nspawn containers from iupdown to networkd.
>> >
>> > The situation looks as follows:
>> >
>> > A single VPS has 3 IPs. One 37.x.x.x/22, and two 91.x.x.x/32. The 37-ip is to be routed to the main server, whereas the two 91-ips should be routed directly to nspawn containers running on the server. The server uses systemd 247 and the container uses systemd 252, both Debian.
>> >
>> > I created a bridge netdev like so:
>> >
>> > [NetDev]
>> > Name=br0
>> > Type=bridge
>> > # Matches physical network card
>> > MACAddress=AA:BB:CC:DD:EE:FF
>> >
>> > Bound the physical ethernet to it like so:
>> >
>> > [Match]
>> > Name=ens3
>> >
>> > [Network]
>> > Bridge=br0
>> >
>> > And set up the main IP for the bridge like so:
>> >
>> > [Match]
>> > Name=br0
>> >
>> > [Network]
>> > DNS=...
>> > DNS=...
>> > Address=37.x.x.x/22
>> > Gateway=37.x.x.1
>> >
>> > The nspawn containers are added to the bridge via
>> >
>> > [Network]
>> > Bridge=br0
>> >
>> > Up until this point everything works. However, configuring networking between the host and containers proved quite difficult and I'm unsure whether I'm doing something wrong or networkd is.
>> >
>> > What I tried was the following, inside the container:
>> >
>> > [Match]
>> > Virtualization=container
>> > Name=host0
>> >
>> > [Address]
>> > Address=91.x.x.x/32
>> >
>> > [Route]
>> > Gateway=37.x.x.x
>> > GatewayOnLink=true
>> >
>> > However, this did not create any usable routes to the host, nor did it throw any errors in the journal. Pinging the host does not work.
>> >
>> > Manually creating the routes with ip route did work:
>> >
>> > ip r add 37.x.x.x dev host0 onlink
>> > ip r add default dev host0 via 37.x.x.x
>> >
>> > I tried a variety of different combinations of options in the .network file, Scope, Type, etc...
>> >
>> > The only thing that successfully created any routes was the following:
>> >
>> > [Match]
>> > Virtualization=container
>> > Name=host0
>> >
>> > [Address]
>> > Address=91.x.x.x/32
>> > Peer=37.x.x.x/32
>> >
>> > [Network]
>> > Gateway=37.x.x.x
>> >
>> > This strikes me as odd because nowhere in the documentation, nor in any online searching could I find this described as necessary (beyond the manpage mentioning that Peer= exists)
>> >
>>
>> How is your Linux container supposed to know that to reach host
>> 37.x.x.x it needs to send a packet via interface with address
>> 91.x.x.x? That is not how Linux routing normally works. You must have
>> a routing entry that tells kernel how to forward packet and assigning
>> address 91.x.x.x to your interface does not magically create any route
>> entry to the network 37.x.x.x. Adding a peer address is one
>> possibility which does it. Another possibility is creating the
>> necessary routes manually like you did.
>>
>> > On the host side, I thought the bridge device, acting on Layer 2, would automatically figure out routes to the containers (via ARP),
>>
>> Bridge (physical or virtual) has nothing to do with routing, it is
>> only using MAC addresses. ARP is used by the kernel to find out the L2
>> address for the destination L3 address which is on the broadcast
>> network. It happens way after the routing decision was already made.
>> So the kernel needs to know that network 37.x.x.x is directly
>> reachable on the broadcast segment to which the interface is connected
>> before the kernel even attempts ARP. That is exactly what your "ip r
>> add 37.x.x.x dev host0 onlink" does. Alternative way is specifying a
>> peer address which implicitly creates a similar routing entry (and
>> peer can be the whole network).
>>
>> > or that nspawn and networkd would interact in some way to add routes. However, this didn't seem to happen, so I also had to add the following to the bridge's .network file:
>> >
>> > [Route]
>> > Source=37.x.x.x
>> > Destination=91.x.x.A
>> >
>> > [Route]
>> > Source=37.x.x.x
>> > Destination=91.x.x.B
>> >
>>
>> Same as above. Host must know how to forward packets to the addresses
>> 91.x.x.x and without routing entries nothing will tell the host how to
>> do it. Routing is bidirectional; a container knowing how to forward
>> traffic to the host does not automatically imply that the host knows
>> how to forward traffic to the container.
>>
>> > With all of this, everything works fine now. However, since the routes, both host-to-container and container-to-host, differ somewhat from the old (also working) setup,
>>
>> Your working setup must have created the same routing entries because
>> otherwise it would not work. Care to show your old configuration?
>>
>> > and some of the steps necessary I could not find described anywhere, I am left wondering if I fundamentally misunderstand something about how Linux networking works, or if perhaps networkd is behaving oddly because of the IP addresses being considered in different networks.
>>
>> You misunderstand how IP networking works. Nothing in your description
>> is Linux specific.
>>
>> >
>> > I would love to find a conclusive answer to this, especially because I want to make sure I understood the fundamental concepts involved correctly.