Peter, thanks for joining the conversation. For the benefit of others, Peter is the manufacturer of the Mesa network hardware we are working with. it's great to have him on board. Rod Webster Rod Webster VMN® www.vmn.com.au Ph: 1300 896 832 Mob: +61 435 765 611 On Tue, 23 May 2023 at 06:38, Peter Wallace <pcw@xxxxxxxxxxx> wrote: > > On Tue, 23 May 2023, Rod Webster wrote: > > > Date: Tue, 23 May 2023 06:02:13 +1000 > > From: Rod Webster <rod@xxxxxxxxxx> > > To: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>, > > linux-rt-users@xxxxxxxxxxxxxxx > > Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al > > NIC > > > > This stuff is hard! I just realised that rtapi_app is a red herring! > > rtapi_app is Linuxcnc and there is nothing wrong with it. Its thread > > is on a 1000us cycle so it seems it gets all its jobs done in 200us > > and then sleeps for 800us which makes perfect sense! > > > > The issue we have is deeper than that. I think we should be looking at > > the NIC interrupt (but don't trust the novice!). > > The network communication is consuming more than the 800us slack from > > time to time. When that happens, our hardware sees the timing overrun > > and increments an internal packet error count. If too many of these > > happen in succession, the hardware decides the RT environment can't be > > relied on, disables further communication and returns an "error > > finishing read" to Linuxcnc to say it's given up. > > > > Marcelo, we didn't resort to C. We were able to use a bash script and > > use a linuxcnc tool called halcmd to query the hardware as shown here. > > #!/usr/bin/bash > > stat=0 > > while (($stat < 1)) > > do > > stat=`halcmd getp hm2_7i96s.0.packet-error-total` > > done > > trace-cmd stop > > > > I think we need to increase the stat threshold so we get more samples > > in our trace before stopping it. The current trace will only have one > > instance. > > Thanks for letting me see the issue more clearly. > > > > > > Rod Webster > > > > > I should note that at least for Intel MACs, the 6.3.1-rt13 and 6.4.0-rc2-rt1 > kernels seem to solve the issue. Not sure what changed but maximum read time > is now in the 200.. 250 usec peak region (about 100 usec more than average) > This is the peak read latency after about 3 days of videos, compiling and > local network activity. > > Sadly 6.4.0-rc3-rt2 has regressed slightly in network latency on my test > systems > > My test systems were all Intel CPUs with 4 cores, isolcpus=3 and the Ethernet > IRQ pinned to CPU3 > > > Peter Wallace