Re: Outstanding latency increase in kernel CAN gateway caught by CANlatester daily builds at 2023-10-02

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Pavel,

is there any news on this latency issue?

I've not seen any can-gw related changes between 6.2 and 6.6.

The only change for linux/net/can/gw.c is this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a30b2bd01c23a7eeace3a3f82c2817227099805

Which should intentionally cause a problem when the cangw tool is used in a wrong way:

From: Oliver Hartkopp <socketcan@xxxxxxxxxxxx>
Date: Wed, 25 Jan 2023 06:54:07 +0100
Subject: can: gw: give feedback on missing CGW_FLAGS_CAN_IIF_TX_OK flag

To send CAN traffic back to the incoming interface a special flag has to
be set. When creating a routing job for identical interfaces without this
flag the rule is created but has no effect.

This patch adds an error return value in the case that the CAN interfaces
are identical but the CGW_FLAGS_CAN_IIF_TX_OK flag was not set.

Reported-by: Jannik Hartung <jannik.hartung@xxxxxxxx>
Signed-off-by: Oliver Hartkopp <socketcan@xxxxxxxxxxxx>
Link: https://lore.kernel.org/all/20230125055407.2053-1-socketcan@xxxxxxxxxxxx
Signed-off-by: Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx>
---
 net/can/gw.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/can/gw.c b/net/can/gw.c
index 23a3d89cad81d..37528826935e7 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -1139,6 +1139,13 @@ static int cgw_create_job(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (gwj->dst.dev->type != ARPHRD_CAN)
 		goto out;

+	/* is sending the skb back to the incoming interface intended? */
+	if (gwj->src.dev == gwj->dst.dev &&
+	    !(gwj->flags & CGW_FLAGS_CAN_IIF_TX_OK)) {
+		err = -EINVAL;
+		goto out;
+	}
+
 	ASSERT_RTNL();

 	err = cgw_register_filter(net, gwj);

Please let me know if I can help on this topic.
So far it looks like a RT/configuration problem to me.

Best regards,
Oliver

On 02.10.23 10:40, Pavel Pisa wrote:
Hello Oliver and others,

two consecutive daily runs of our CAN latency system

   https://canbus.pages.fel.cvut.cz/#can-bus-channels-mutual-latency-testing

shows extreme increase in latency of the kernel CAN gateway under the load.
The first run with increased latency (run-DATE-TIME-KERNEL-OPTIONS)

run-231002-045216-hist+6.6.0-rc3-rt5-ge31516c1e553+flood-kern-prio-fd-load.jsonn

previous one consistent with daily runs form May

run-231001-045220-hist+6.6.0-rc3-rt5-ge31516c1e553+flood-kern-prio-fd-load.json

The history of the monitoring for kernel gateway under the load for latest RT
kernels,
branch run on "linux-rt-devel/for-kbuild-bot/current-stable" branch

https://canbus.pages.fel.cvut.cz/can-latester/inspect.html?kernel=rt&load=1&flood=1&fd=1&prio=1&kern=1

Monitoring of latency when userspace application is used to forward
data from one to another CAN interface does not show similar excess

https://canbus.pages.fel.cvut.cz/can-latester/inspect.html?kernel=rt&load=1&flood=1&fd=1&prio=1&kern=0

It is interesting that when priority of CAN controller interrupt service
routines
are not boosted then problem does not appear. Priority 90 is set for each
irq/[0-9]+-can[0-9]
thread by

   chrt -f --pid 90 $pid

The device under the test as well as messages generation and monitoring
system are MZ_APO boards (AMD/XlinX Zynq XC7Z010) with CTU CAN FD IP core
CAN controller configured for 10 ns frames timestamping.

The problem can be in configuration of our system, CTU CAN FD IP core driver
or specific to Zynq ARM platform. But it is generally suspicious because
after initial tuning of the test system there has not been modifications
for long time. Monitoring system is running 6.2.0-rt3-00007-ge3a16816f987
kernel for all time and no problem with some Rx buffers overflow
on the tester side is reported for time covering all tests in the question.

Please, report if you have some idea which change between reported
versions from 2023-10-01 and 2023-10-02 could be reason for the change.
I plan to keep eye on results till end of the week and if the problem
continues then I start to investigate more by beginning of the next week
when I should find a little more time. I am quite busy by preparation for
conference and teaching this week so I do not expect to find much time.

Best wishes,

                 Pavel




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux