Re: Non RT threads impact on RT thread

Julia Cartwright <julia@xxxxxx> · Tue, 22 May 2018 13:34:42 -0500

On Tue, May 22, 2018 at 12:00:27PM +0200, Jordan Palacios wrote:
> Hello,

Hello Jordan-

> We are currently running a version of the linux kernel (3.18.24) with
> the RT-PREEMPT patch. In our system there are several non RT tasks and
> one RT task. The RT process runs in the FIFO scheduler with 95
> priority and a control loop of 1ms.
>
> We have achieved latencies of about 5us which are perfect for us.
>
> Our issue is that the RT task sometimes misses one of its cycles due
> to an unexpected very long execution time of its control loop. In our
> system this is a critical failure.
>
> We enabled tracing in the kernel and started measuring the execution
> time of the RT thread. The execution time is quite constant (about
> 200us), which random spikes every now and then. Thing is, the less non
> RT tasks running in the system the better the RT task behaves.
>
> We wrote a very simple RT application that does some light work and
> writes its execution time using the trace_marker. Execution time is
> constant but IO intensive stuff, like a stress --io 32 or a hdparm,
> will have and impact on its execution time. This is surprising because
> the test does not any kind of work related to IO. Nor does the RT task
> in our system for that matter.

You haven't specified anything about your hardware setup, nor any
numbers here showing the magnitude of these latency spikes.  Could you
elaborate?

> Our question is: Is this behaviour normal? Why are non RT tasks
> affecting the RT task performance?  Is there any other kind of test
> that we could run that would shed some light on this issue?

I think it's "normal" that there will be _some_ impact from non-RT
tasks.  Even across CPUs in the case where CPUs share some level of
cache.  The question is what magnitude of impact should be expected.

Another thing you might want to look at is irq_smp_affinity, in the case
where your RT task is still serving interrupts it should not be.

Your test already has some scaffolding for tracing.  Start a run with
tracing enabled and stop tracing once you observe a latency larger than
expected; dump the trace buffer; inspect.

  Julia
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html