On 9/2/22 17:13, Jacob Kroon wrote:
Hi Oliver,
On 9/1/22 18:35, Oliver Hartkopp wrote:
Hi Jacob,
On 01.09.22 11:38, Jacob Kroon wrote:
On 8/30/22 21:15, Oliver Hartkopp wrote:
I assume you have a transceiver, right? ;-)
Yes,, all nodes are using a TJA1050 transceiver
(https://www.nxp.com/docs/en/data-sheet/TJA1050.pdf)
Good!
What is the other endpoint? The EG20T and another (automotive) ECU?
Currently I have 4 nodes in the network, EG20T is in one end.
Ok, that's a good base for testing.
Do you have another CAN node which can be attached to the EG20T
setup (e.g. some ECU or an USB CAN adapter)?
Yes I do have a CAN analyser from Microchip. I guess I can record
all traffic with the analyzer, and compare it to what I see with
"candump can0" on the host. Or do you have some other suggestion ?
Yes, please add the CAN analyzer from Microchip too!
The problem with only two nodes is that you have to be very precise
with bitrate settings and sampling points so that the receiving node
needs to properly set the ACK to acknowlege the CAN frame.
I had been working with a MSCAN system some time ago and that wasn't
able to talk to a commercial CAN tool until I added another node
(from another CAN tool provider).
Maybe you can make the other node talk to the Microchip CAN analyzer
and let the EG20T receive that traffic first.
I used "candump can0 -l" on the EG20T host to capture the traffic,
and then connected an CAN USB analyzer to the network and used that
to capture the traffic. One thing sticks out. This is the log from
the CAN USB analyzer:
...
505.7052;RX;0x464;3;0x01;0x01;0x00;;;;;;
505.7052;RX;0x464;3;0x00;0x00;0x00;;;;;;
505.7063;RX;0x65;64;;;;;;;;;
What should this be?
A length of 64 and no data ??
This is no valid CAN frame.
505.7662;RX;0x440;3;0x32;0x20;0xFA;;;;;;
(..)
Note the third message from the top. This is what "candump" on the
host logs:
...
(1662022485.638794) can0 464#010100
(1662022485.638940) can0 464#000000
(1662022485.699405) can0 440#3220FA
The correct CAN frames are displayed correctly.
...
It fails to see the 3rd message from the previous log. What would
that indicate ? The CAN analyzer sees the message, but the EG20T
doesn't.
Don't know if this is an error on the CAN bus. You can also print
error messages of detected CAN bus problems with adding an error
message filter.
See 'candump -h' :
candump -l any,0:0,#FFFFFFFF
(log error frames and also all data frames)
Thank you Oliver for all the good hints.
I've done some more logging, but there are no error frames being logged.
I can see that both pch_can and c_can_pci drivers call
can_put_echo_skb() in their ndo_start_xmit functions, but neither checks
the return value whether it succeeded or not. Shouldn't both these
return NETDEV_TX_BUSY if there are no echo slots available ?
One reason I ask is because whenever I strace the application, it would
seem the problem goes away, and I'm guessing strace:ing will slow down
my application.
I did try the patch below, but then I just get the lockups without the
warning messages:
diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
index 964c8a09226a..0a230368c443 100644
--- a/drivers/net/can/pch_can.c
+++ b/drivers/net/can/pch_can.c
@@ -889,6 +889,10 @@ static netdev_tx_t pch_xmit(struct sk_buff *skb, struct net_device *ndev)
return NETDEV_TX_OK;
tx_obj_no = priv->tx_obj;
+
+ if (priv->can.echo_skb[tx_obj_no - PCH_RX_OBJ_END - 1])
+ return NETDEV_TX_BUSY;
+
if (priv->tx_obj == PCH_TX_OBJ_END) {
if (ioread32(&priv->regs->treq2) & PCH_TREQ2_TX_MASK)
netif_stop_queue(ndev);
Jacob