Hi Markus, On Mon, Oct 16, 2023 at 01:02:35PM +0000, Markus Wigge wrote: > >> With each received message I got a "device or resource busy" when conntrackd > >> tried to commit it to the kernel. > >> > >> When I try to commit the cache now I get all the same errors but at once ;-) > > > > That means there is already an entry in the kernel. > > Is there any known change between bullseye and bookworm that might > explain this? Unfortunately I am not so deep inside the kernel mechanics > involved here. The only spots where EBUSY could reasonably happen in the kernel is here: static int ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[]) { unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS])); unsigned long d = ct->status ^ status; if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY)) /* SEEN_REPLY bit can only be set */ return -EBUSY; if (d & IPS_ASSURED && !(status & IPS_ASSURED)) /* ASSURED bit can only be set */ return -EBUSY; And this EBUSY can only happen if userspace (conntrackd) is losing race to update an already existing entry in the kernel. > >> The architecture is quite simple and used to work since several years. It > >> started flooding the syslog with dist-upgrade to "bookworm". > >> Two active-active nodes share a bunch of VLANs in two keepalived groups. > >> > >> Each node is primary for one of the groups and secondary for the other. The > >> interfaces are configured correctly and traffic is flowing as expected. > > > > That is, flow-based distribution between the firewalls, correct? > > I am not sure about your definition of flow-based but it sounds > plausible. Each node is responsible for its own dedicated VLANs they > only failover on reboot or upgrades etc. So VLAN interfaces are distributed between nodes and, on failover, one node picks up the VLAN interfaces of the node that is failing? I am trying to understand if, in your setup, one node is active but is is also at the same time a backup for the flows that are handled by the other node. > >> bird and bird6 are announcing the routes correctly on each side. > >> Shorewall is used to filter the passing traffic. Thats all. > >> > >>> > >>> EBUSY can be triggered in nf_conntrack_netlink.c in a few spots, this > >>> is most likely ct status flags and conntrackd losing race to update > >>> and entry that is being picked up from packet path. > >>> > >>> Is your ruleset dropping invalid packets to disable lazy pick up? > >>> That is, nf_conntrack_tcp_loose sysctl is set to zero. > >> > >> nope: > >> # sysctl -a | grep loose > >> net.netfilter.nf_conntrack_dccp_loose = 1 > >> net.netfilter.nf_conntrack_tcp_loose = 1 > > > > If _loose is enabled, that means kernel conntrack can pick up entries > > from the middle base from packet path. > > I don't understand this part. The kernel picks up connections > automatically? But how when the flow started on the other node? This is how it works with net.netfilter.nf_conntrack_tcp_loose = 1, that toggle enables "poor man" connection pickup, that is, the kernel infers from the middle of the connection the current state. > > Is your ruleset dropping invalid packets? > > Only for smurfs as far as I can see: > > 203M 19G smurfs 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate INVALID,NEW,UNTRACKED > > > Chain smurfs (7 references) > > pkts bytes target prot opt in out source destination > > 19M 6211M RETURN 0 -- * * 0.0.0.0 0.0.0.0/0 > > 0 0 smurflog 0 -- * * 0.0.0.0/0 0.0.0.0/0 [goto] ADDRTYPE match src-type BROADCAST > > 0 0 smurflog 0 -- * * 224.0.0.0/4 0.0.0.0/0 [goto] This RETURN means you take back invalid packets to the chain where the jump to smurfs happen. > > It looks like conntrackd is getting late to synchronize the states > > for some flows because the packet path already created the entry via > > _loose mechanism. > > Following the logs it appears to me that every single entry is getting > late then. I doubt that and don't see where state should come from > beforehand. >From datapath itself, from the _loose mechanism that is enabled.