Re: Excessive network latency when using Realtek R8168/R8111 et al NIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This stuff is hard! I just realised that rtapi_app is a red herring!
rtapi_app is Linuxcnc and there is nothing wrong with it. Its thread
is on a 1000us cycle so it seems it gets all its jobs done in 200us
and then sleeps for 800us which makes perfect sense!

The issue we have is deeper than that. I think we should be looking at
the NIC interrupt (but don't trust the novice!).
The network communication is consuming more than the 800us slack from
time to time. When that happens, our hardware sees the timing overrun
and increments an internal packet error count. If too many of these
happen in succession, the hardware decides the RT environment can't be
relied on, disables further communication and returns an "error
finishing read" to Linuxcnc to say it's given up.

Marcelo, we didn't resort to C. We were able to use a bash script and
use a linuxcnc tool called halcmd to query the hardware as shown here.
#!/usr/bin/bash
stat=0
while (($stat < 1))
do
stat=`halcmd getp hm2_7i96s.0.packet-error-total`
done
trace-cmd stop

I think we need to increase the stat threshold  so we get more samples
in our trace before stopping it. The current trace will only have one
instance.
Thanks for letting me see the issue more clearly.


Rod Webster



Rod Webster

VMN®

www.vmn.com.au

Ph: 1300 896 832

Mob: +61 435 765 611




On Tue, 23 May 2023 at 03:13, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>
> On Mon, May 22, 2023 at 08:06:39PM +1000, Rod Webster wrote:
> > Yes, I won't be using a DKMS driver and will test with the in-tree driver.
> > I rolled a PC back to the Debian 6.1 RT kernel and the default in-tree
> > R8169 driver yesterday with isolcpus=2,3.
> >
> > I had a shot at it last night with the napi thread but was confused by
> > the trace results so will try again.
> > I could not see our PacketErrorTest script appearing in the trace but
> > it stopped the trace when it should.
>
> Rod,
>
> Can you describe the PacketErrorTest script and how do you know, from
> it, that latency has crossed the threshold? What was confusing about
> the traces?
>
> Earlier you wrote:
> I've had some feedback from a hardware supplier.
> The Linuxcnc RT ethernet driver has a pin that reports ethernet packet
> error count.
> This increments when the ethernet response exceeds our real time servo
> thread period (typically 1 ms)
> We should be able to monitor this pin and stop the trace if it is >  0
> without too much effort.
>
> So you are seeing that pin increase its value?
>
> When you do, you should write to the "tracing_on" file (a zero):
>
>         /*
>          * if we're not tracing and the tracing_on fd is not open,
>          * open the tracing_on file so that we can stop the trace
>          * if we hit a breaktrace threshold
>          */
>         if (trace_fd < 0) {
>                 sprintf(path, "%s/%s", fileprefix, "tracing_on");
>                 if ((trace_fd = open(path, O_WRONLY)) < 0)
>                         warn("unable to open tracing_on file: %s\n", path);
>         }
>
>         /* now stop any trace */
>         write(trace_fd, "0\n", 2);
>
> The trace file (/sys/kernel/debug/tracing/trace) should now contain
> the trace up to the write to tracing_on file.
>
> Sebastian: didnt see anything weird in the traces as well:
>
> "Follows the delta between a list of timestamped events,
> where each event is: "TIMESTAMP: rtapi is scheduled IN CPU-3"
> (there is at one time only a single process scheduled in cpu-3).
>
> [191.050245, 191.050278]: 0.000033
> [191.050278, 191.050293]: 0.000015
> [191.050293, 191.050307]: 0.000014
> [191.050307, 191.050322]: 0.000015
> [191.050322, 191.050336]: 0.000014
>
> rtapi_app goes to sleep for approximately 15us, and is awakened
> by a timer.
>
> [191.050336, 191.050350]: 0.000014
> [191.050350, 191.050365]: 0.000015
> [191.050365, 191.050379]: 0.000014
> [191.050379, 191.050393]: 0.000014
> [191.050393, 191.050408]: 0.000015
> [191.050408, 191.051243]: 0.000835
> [191.051243, 191.051272]: 0.000029
>
> It eventually sleeps for 800us (and this pattern repeats)".
>
> Perhaps tracing the system call which sends out the packet would be
> useful as well.
>
> > Also our RT process did not appear to be on an isolated thread which
> > seemed odd. I did not get as far as setting the affinity for napi.
> >
> > I've also installed  Bookworm RC3 on another PC with the default RT
> > kernel on a Gigabyte Brix N3160 with an R8111/R8168 NIC and the
> > in-tree R8169 driver.
> > This PC historically had max latency of about 130000 ns  back with
> > kernel 4.x and now its reporting 23000 ns with the 6.1 kernel over 24
> > hours testing with the linuxcnc tools. Quite amazing what you guys
> > have achieved! It will be a few days before I can lug my hardware to
> > work to test on it as well for another data point.
> >
> > I really appreciate your willingness to help. This remains a
> > frustrating issue for many Linuxcnc users who are using my compiled
> > 6.3 kernel to get out of jail.
> > Bear with me as I bumble my way through!
> >
> > Rod Webster
> >
> >
> >
> >
> >
> > Rod Webster
> >
> > VMN®
> >
> > www.vmn.com.au
> >
> > Ph: 1300 896 832
> >
> > Mob: +61 435 765 611
> >
> >
> >
> >
> > On Mon, 22 May 2023 at 19:32, Sebastian Andrzej Siewior
> > <bigeasy@xxxxxxxxxxxxx> wrote:
> > >
> > > On 2023-05-19 21:41:30 [+1000], Rod Webster wrote:
> > > > I will do some testing tomorrow now I know what's going on but I
> > > > wonder how relevant it will be given this turnaround?
> > >
> > > I'm not going to look at the dkms/OOT driver. If there is an actual
> > > problem or something odd with mainline and RT and I will look into it.
> > > I managed to find an older box which is using the r8169 and I'm going to
> > > boot and see if there the driver is doing something odd what I didn't
> > > see earlier…
> > >
> > > Please make sure that you are using the in-tree driver. If you
> > > experience high latencies then I'm willing to help investigate.
> > >
> > > > Rod Webster
> > >
> > > Sebastian
> >
> >
>




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux