On 28.10.22 09:46, Marc Kleine-Budde wrote:
On 28.10.2022 09:13:09, Oliver Hartkopp wrote:
Hello,
On 28.10.22 05:33, Zhengchao Shao wrote:
It causes NULL pointer dereference when testing as following:
(a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
(b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
link device, and bind vxcan device to bond device (can also use
ifenslave command to bind vxcan device to bond device).
(c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
(d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
The bond device invokes the can-raw protocol registration interface to
receive CAN packets. However, ml_priv is not allocated to the dev,
dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
it will occur the NULL pointer dereference issue.
I can see the problem and see that the patch makes sense for
can_rx_register().
But for me the problem seems to be located in the bonding device.
A CAN interface with dev->type == ARPHRD_CAN *always* has the dev->ml_priv
and dev->ml_priv_type set correctly.
I'm not sure if a bonding device does the right thing by just 'claiming' to
be a CAN device (by setting dev->type to ARPHRD_CAN) but not taking care of
being a CAN device and taking care of ml_priv specifics.
This might also be the case in other ml_priv use cases.
Would it probably make sense to blacklist CAN devices in bonding devices?
NACK - We had this discussion 2.5 years ago:
| https://lore.kernel.org/all/00000000000030dddb059c562a3f@xxxxxxxxxx
| https://lore.kernel.org/all/20200130133046.2047-1-socketcan@xxxxxxxxxxxx
...and davem pointed out:
| https://lore.kernel.org/all/20200226.202326.295871777946911500.davem@xxxxxxxxxxxxx
On 26.02.2020 20:23:26, David Miller wrote:
[...]
What I don't get is why the PF_CAN is blindly dereferencing a device
assuming what is behind bond_dev->ml_priv.
If it assumes a device it access is CAN then it should check the
device by comparing the netdev_ops or via some other means.
This restriction seems arbitrary.
With the addition of struct net_device::ml_priv_type in 4e096a18867a
("net: introduce CAN specific pointer in the struct net_device"), what
davem requested is now possible.
Marc
Oh, thanks for the heads up!
Didn't have remembered that specific discussion.
Wouldn't we need this check in can_rx_unregister() and maybe
can[|fd|xl]_rcv() then too?
As all these functions check for ARPHRD_CAN and later access ml_priv.
Best regards,
Oliver