Re: Excessive network latency when using Realtek R8168/R8111 et al NIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 23 May 2023, Rod Webster wrote:

Date: Tue, 23 May 2023 06:02:13 +1000
From: Rod Webster <rod@xxxxxxxxxx>
To: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>,
    linux-rt-users@xxxxxxxxxxxxxxx
Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al
    NIC

This stuff is hard! I just realised that rtapi_app is a red herring!
rtapi_app is Linuxcnc and there is nothing wrong with it. Its thread
is on a 1000us cycle so it seems it gets all its jobs done in 200us
and then sleeps for 800us which makes perfect sense!

The issue we have is deeper than that. I think we should be looking at
the NIC interrupt (but don't trust the novice!).
The network communication is consuming more than the 800us slack from
time to time. When that happens, our hardware sees the timing overrun
and increments an internal packet error count. If too many of these
happen in succession, the hardware decides the RT environment can't be
relied on, disables further communication and returns an "error
finishing read" to Linuxcnc to say it's given up.

Marcelo, we didn't resort to C. We were able to use a bash script and
use a linuxcnc tool called halcmd to query the hardware as shown here.
#!/usr/bin/bash
stat=0
while (($stat < 1))
do
stat=`halcmd getp hm2_7i96s.0.packet-error-total`
done
trace-cmd stop

I think we need to increase the stat threshold  so we get more samples
in our trace before stopping it. The current trace will only have one
instance.
Thanks for letting me see the issue more clearly.


Rod Webster



I should note that at least for Intel MACs, the 6.3.1-rt13 and 6.4.0-rc2-rt1 kernels seem to solve the issue. Not sure what changed but maximum read time is now in the 200.. 250 usec peak region (about 100 usec more than average) This is the peak read latency after about 3 days of videos, compiling and local network activity.

Sadly 6.4.0-rc3-rt2 has regressed slightly in network latency on my test systems

My test systems were all Intel CPUs with 4 cores, isolcpus=3 and the Ethernet
IRQ pinned to CPU3


Peter Wallace



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux