Hi Marc and Dario,
(CC:ing patch author Dario)
On 9/5/22 17:54, Marc Kleine-Budde wrote:
On 01.09.2022 11:38:31, Jacob Kroon wrote:
I used "candump can0 -l" on the EG20T host to capture the traffic, and
then connected an CAN USB analyzer to the network and used that to
capture the traffic. One thing sticks out. This is the log from the
CAN USB analyzer:
Who generates these CAN messages?
The invalid frames in the logs are being sent from the the EG20T host,
but some of them have also originated from the other nodes in the network.
...
505.7052;RX;0x464;3;0x01;0x01;0x00;;;;;;
505.7052;RX;0x464;3;0x00;0x00;0x00;;;;;;
505.7063;RX;0x65;64;;;;;;;;;
As Oliver pointed out, this doesn't look like a valid CAN frame. Is the
analyzer and/or sender configured for CAN-FD?
No, none of the nodes in the network are sending CAN-FD frames, they are
all normal CAN frames, max 8 bytes.
505.7662;RX;0x440;3;0x32;0x20;0xFA;;;;;;
505.7912;RX;0x44C;3;0x35;0x20;0xFA;;;;;;
505.9632;RX;0x464;3;0x00;0x00;0x00;;;;;;
505.9632;RX;0x464;3;0x01;0x01;0x00;;;;;;
505.9752;RX;0x468;3;0x51;0x20;0xFA;;;;;;
506.0362;RX;0x440;3;0x32;0x20;0xFA;;;;;;
506.0622;RX;0x44C;3;0x35;0x20;0xFA;;;;;;
506.2112;RX;0x464;3;0x00;0x00;0x00;;;;;;
506.2112;RX;0x464;3;0x00;0x00;0x00;;;;;;
506.2462;RX;0x468;3;0x51;0x20;0xFA;;;;;;
506.3072;RX;0x440;3;0x32;0x20;0xFA;;;;;;
506.3322;RX;0x44C;3;0x35;0x20;0xFA;;;;;;
506.4572;RX;0x464;3;0x00;0x00;0x00;;;;;;
506.4580;RX;0x464;3;0x00;0x00;0x00;;;;;;
506.5162;RX;0x468;3;0x51;0x20;0xFA;;;;;;
522.7203;RX;0x1E;1;0xFF;;;;;;;;
...
Note the third message from the top. This is what "candump" on the host
logs:
...
(1662022485.638794) can0 464#010100
(1662022485.638940) can0 464#000000
(1662022485.699405) can0 440#3220FA
(1662022485.725166) can0 44C#3520FA
(1662022485.896858) can0 464#000000
(1662022485.897382) can0 464#010100
(1662022485.909042) can0 468#5120FA
(1662022485.970036) can0 440#3220FA
(1662022485.995596) can0 44C#3520FA
(1662022486.144685) can0 464#000000
(1662022486.144768) can0 464#000000
(1662022486.179595) can0 468#5120FA
(1662022486.240561) can0 440#3220FA
(1662022486.266274) can0 44C#3520FA
(1662022486.391248) can0 464#000000
(1662022486.391469) can0 464#000000
(1662022486.450115) can0 468#5120FA
(1662022502.662035) can0 01E#FF
...
It fails to see the 3rd message from the previous log. What would that
indicate ? The CAN analyzer sees the message, but the EG20T doesn't.
Is this error somehow related to the "can0: can_put_echo_skb: BUG!
echo_skb 0 is occupied"?
Possibly.
What I do know is that if I revert commit:
"can: c_can: cache frames to operate as a true FIFO"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=387da6bc7a826cc6d532b1c0002b7c7513238d5f
then everything looks good. I don't get any BUG messages, and the host
has been running overnight without problems, so it seems to have fixed
the network interface lockup as well.
Jacob