Hello Jaroslav, On 05/11/2019 22.46, Jaroslav Beran wrote:
So far I've learned this issue is most probably caused by upper (net and can) layers (so this is not specific for certain controller driver). When a driver calls can_bus_off, it sets carrier-off and triggers linkwatch_* actions that deactivate net queues and substitute a struct qdisc with `noop_qdisc`. Upon sending a frame, it's enqueue function - noop_enqueue - just returns NET_XMIT_CN, which is transformed by net_xmit_errno macro to zero, that's passed by net/can/af_can.c:can_send up to a userspace caller of write as success.
Hm.
According to description for qdisc return codes in include/linux/netdevice.h, NET_XMIT_CN stands for `congestion notification` and further /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It * indicates that the device will soon be dropping packets, or already drops * some packets of the same priority; prompting us to send less aggressively. */ Is this behavior appropriate for a node in BUS-OFF state? I'd rather expect such controller would be always dropping all frames (not just soon and some) until reset.
The common use of the net_xmit_errno macro probably really does not fit to the CAN specialties ...
In current situation a caller of write gets success even if his frame is lost for sure. Is there any specific reason for this? Of course he can be notified by receiving error frame, but why don't just return error in can_send?
Yes. It makes sense to forward the carrier-off state that is thankfully provided by the linkwatch triggers to the user space.
Looking to man(2) send we should provide -ENOBUFS in the case of carrier-off state, right?
Would you like to provide a patch? Best regards, Oliver