Hi Kurt, Dmitry,
On 21/01/2020 21.39, Kurt Van Dijck wrote:
Maybe move the crosslinking to before the register, then they're
inaccessible from userspace.
I think I found the problem:
[ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu
[ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128)
The bonding netdev bond5128 enslaved the vxcan1 netdev. As vxcan1 is a
CAN netdev with ARPHRD_CAN the bonding process executes
if (slave_dev->type != ARPHRD_ETHER)
bond_setup_by_slave(bond_dev, slave_dev);
in bond_enslave() in .../bonding/bond_main.c
Which does this:
static void bond_setup_by_slave(struct net_device *bond_dev,
struct net_device *slave_dev)
{
bond_dev->header_ops = slave_dev->header_ops;
bond_dev->type = slave_dev->type;
bond_dev->hard_header_len = slave_dev->hard_header_len;
bond_dev->addr_len = slave_dev->addr_len;
memcpy(bond_dev->broadcast, slave_dev->broadcast,
slave_dev->addr_len);
}
So bond5128 becomes an ARPHDR_CAN interface BUT without having a
netdev_priv() space which contains our lovely can_ml_priv structure with
the dev_rcv_lists for the CAN filters.
I was able to confirm the bisected commit but the crashes still were
pure luck IMO.
can_rx_register() accesses netdev_priv() of the bonding device - but
there are no CAN filters. BAM!
So we need to make sure that ARPHDR_CAN dev->type can not be enslaved by
the bonding driver.
Best regards,
Oliver