Hi Kurt,
On 24/01/2020 20.05, Kurt Van Dijck wrote:
On vr, 24 jan 2020 19:43:23 +0100, Oliver Hartkopp wrote:
Hi Kurt, Dmitry,
On 21/01/2020 21.39, Kurt Van Dijck wrote:
Maybe move the crosslinking to before the register, then they're
inaccessible from userspace.
I think I found the problem:
Well done!
[ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu
[ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128)
The bonding netdev bond5128 enslaved the vxcan1 netdev. As vxcan1 is a CAN
netdev with ARPHRD_CAN the bonding process executes
You were able to make the syscalls comprehensible then?
Not really. I was just digging into "what bonding CAN interfaces"
probably means to us :-)
The fact that we are only handling ARPHDR_CAN interfaces *and* the
dev_rcv_lists have not been available finally lead to the problem.
if (slave_dev->type != ARPHRD_ETHER)
bond_setup_by_slave(bond_dev, slave_dev);
in bond_enslave() in .../bonding/bond_main.c
Which does this:
static void bond_setup_by_slave(struct net_device *bond_dev,
struct net_device *slave_dev)
{
bond_dev->header_ops = slave_dev->header_ops;
bond_dev->type = slave_dev->type;
bond_dev->hard_header_len = slave_dev->hard_header_len;
bond_dev->addr_len = slave_dev->addr_len;
memcpy(bond_dev->broadcast, slave_dev->broadcast,
slave_dev->addr_len);
}
So bond5128 becomes an ARPHDR_CAN interface BUT without having a
netdev_priv() space which contains our lovely can_ml_priv structure with the
dev_rcv_lists for the CAN filters.
I was able to confirm the bisected commit but the crashes still were pure
luck IMO.
can_rx_register() accesses netdev_priv() of the bonding device - but there
are no CAN filters. BAM!
So we need to make sure that ARPHDR_CAN dev->type can not be enslaved by the
bonding driver.
This implies modifying bond_main.c, right?
I think so. But I wanted to have this discussed on the mailing list
before preparing a patch.
Best,
Oliver