Re: cyclictest result variations

"Luis Claudio R. Goncalves" <lclaudio@xxxxxxxx> · Thu, 29 Mar 2018 10:54:28 -0300

On Thu, Mar 29, 2018 at 07:30:27AM +0000, Phil Edworthy wrote:
> Hi John, Clark,
> 
> On 28 March 2018 16:32, Clark Williams wrote:
> > On Wed, 28 Mar 2018 16:56:27 +0200
> > John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
> > 
> > > On 2018-03-28, Phil Edworthy <phil.edworthy@xxxxxxxxxxx> wrote:
> > > >> > I found that cyclictest results vary from one run to another.
> > > >> >
> > > >> > [...]
> > > >> >
> > > >> > Is it common knowledge that cyclictest results vary so much from
> > > >> > one run to another? Any ideas how to mitigate this?
> > > >>
> > > >> It would be helpful if you provided the command arguments you use
> > > >> for your tests. Particularly important options to consider:
> > > >>
> > > >>     -a / --affinity
> > > >>     -m / --mlockall
> > > >>     -n / --nanosleep
> > > >>     -t / --threads
> > > >>          --secaligned
> > > >>
> > > >> and of course giving it an appropriate realtime priority:
> > > >>
> > > >>     -p / --priority
> > > >
> > > > Sure:
> > > > cyclictest   -m -n -Sp99 -i200 -h300 -M -D 10h
> > >
> > > I would recommend using prio 98 instead of 99. In general,
> > > applications should not be taking the CPU from the migration or
> > > watchdog tasks. And usually you want cyclictest to reflect the
> > > latencies of real applications.
> > 
> > Agree, please don't use fifo:99. Honestly there's no difference between
> > fifo:51 and fifo:98. The interrupt threads default to fifo:50, so you want to be
> > above that but no real need to contend with migration, watchdog or posix
> > timers.
> Ok, I have changed the pri to 98, no difference in the results that I can see.
> 
> I did some overnight tests with 100 runs of cyclictest running for 1 minute.
> Stats below were calculated using stats package from http://web.cs.wpi.edu/~claypool/misc/stats/stats.html
> 
> 1. Interval fixed to 400us, not using --secalign
> Min: 20  Avg: 37  Max: 187  (avg of 100xMax is 134)
> 
> 2. Interval fixed to 400us, using --secalign
> Min: 20  Avg: 37  Max: 177  (avg of 100xMax is 150)
> 
> 3. Interval increases from 400 to 499, not using --secalign
> Min: 20  Avg: 37  Max: 211  (avg of 100xMax is 157)
> 
> 4. Interval increases from 400 to 499, using --secalign
> Min: 20  Avg: 37  Max: 202  (avg of 100xMax is 157)
> 
> While --secalign may provide more consistent results, it appears that it is
> not as good at identifying the worst case latency.
> It appears that testing different intervals is much better at identifying the
> worst case latency.

Have you used the hwlat ftrace tracer or hwlatdetector.py from rt-tests in
order to verify if your system have SMI-induced latency spikes? That may not
be part of the problem described here, but spurious SMI spikes could account
to some of the discrepancies.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html