Re: commit to kernel fails since Debian 12 (bookworm)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Markus,

On Mon, Oct 16, 2023 at 01:02:35PM +0000, Markus Wigge wrote:
> >> With each received message I got a "device or resource busy" when conntrackd
> >> tried to commit it to the kernel.
> >>
> >> When I try to commit the cache now I get all the same errors but at once ;-)
> >
> > That means there is already an entry in the kernel.
>
> Is there any known change between bullseye and bookworm that might
> explain this? Unfortunately I am not so deep inside the kernel mechanics
> involved here.

The only spots where EBUSY could reasonably happen in the kernel is here:

static int
ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
{
        unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
        unsigned long d = ct->status ^ status;

        if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
                /* SEEN_REPLY bit can only be set */
                return -EBUSY;

        if (d & IPS_ASSURED && !(status & IPS_ASSURED))
                /* ASSURED bit can only be set */
                return -EBUSY;

And this EBUSY can only happen if userspace (conntrackd) is losing
race to update an already existing entry in the kernel.

> >> The architecture is quite simple and used to work since several years. It
> >> started flooding the syslog with dist-upgrade to "bookworm".
> >> Two active-active nodes share a bunch of VLANs in two keepalived groups.
> >>
> >> Each node is primary for one of the groups and secondary for the other. The
> >> interfaces are configured correctly and traffic is flowing as expected.
> >
> > That is, flow-based distribution between the firewalls, correct?
>
> I am not sure about your definition of flow-based but it sounds
> plausible. Each node is responsible for its own dedicated VLANs they
> only failover on reboot or upgrades etc.

So VLAN interfaces are distributed between nodes and, on failover, one
node picks up the VLAN interfaces of the node that is failing? I am
trying to understand if, in your setup, one node is active but is is
also at the same time a backup for the flows that are handled by the
other node.

> >> bird and bird6 are announcing the routes correctly on each side.
> >> Shorewall is used to filter the passing traffic. Thats all.
> >>
> >>>
> >>> EBUSY can be triggered in nf_conntrack_netlink.c in a few spots, this
> >>> is most likely ct status flags and conntrackd losing race to update
> >>> and entry that is being picked up from packet path.
> >>>
> >>> Is your ruleset dropping invalid packets to disable lazy pick up?
> >>> That is, nf_conntrack_tcp_loose sysctl is set to zero.
> >>
> >> nope:
> >> # sysctl -a | grep loose
> >> net.netfilter.nf_conntrack_dccp_loose = 1
> >> net.netfilter.nf_conntrack_tcp_loose = 1
> >
> > If _loose is enabled, that means kernel conntrack can pick up entries
> > from the middle base from packet path.
>
> I don't understand this part. The kernel picks up connections
> automatically? But how when the flow started on the other node?

This is how it works with net.netfilter.nf_conntrack_tcp_loose = 1,
that toggle enables "poor man" connection pickup, that is, the kernel
infers from the middle of the connection the current state.

> > Is your ruleset dropping invalid packets?
>
> Only for smurfs as far as I can see:
> >  203M   19G smurfs     0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID,NEW,UNTRACKED
>
> > Chain smurfs (7 references)
> >  pkts bytes target     prot opt in     out     source               destination
> >   19M 6211M RETURN     0    --  *      *       0.0.0.0              0.0.0.0/0
> >     0     0 smurflog   0    --  *      *       0.0.0.0/0            0.0.0.0/0           [goto]  ADDRTYPE match src-type BROADCAST
> >     0     0 smurflog   0    --  *      *       224.0.0.0/4          0.0.0.0/0           [goto]

This RETURN means you take back invalid packets to the chain where the
jump to smurfs happen.

> > It looks like conntrackd is getting late to synchronize the states
> > for some flows because the packet path already created the entry via
> > _loose mechanism.
>
> Following the logs it appears to me that every single entry is getting
> late then. I doubt that and don't see where state should come from
> beforehand.

>From datapath itself, from the _loose mechanism that is enabled.



[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux