Re: Non RT threads impact on RT thread

Julia Cartwright <julia@xxxxxx> · Wed, 23 May 2018 11:07:08 -0500

On Wed, May 23, 2018 at 05:43:57PM +0200, Jordan Palacios wrote:
> Hello,
>
> Thanks for the answers.
>
> We don't have any nvidia card installed on the system.
>
> We'll try the isolcpus in conjunct with our cpuset setup and we'll
> look into the irq_smp_affinity.

Given the spike magnitudes you are seeing, I doubt they are task
migration related; meaning I don't think that isolcpus will make a
difference.

> These are some of the specs of the system. Let me know if you need
> something else that might be relevant.
>
> Active module: Congatec conga-TS77/i7-3612QE
> Carrier: Connect Tech CCG008
> DDR3L-SODIMM-1600 (8GB)
> Crucial MX200 250GB mSATA SSD
>
> I have uploaded one graph with an example of our issue here:
>
> https://i.imgur.com/8KoxzNV.png
>
> In blue the time between cycles and in green the execution time of
> each loop. X is in seconds and Y in microseconds. As you can see the
> execution time is quite constant until we run some intensive IO tasks.
> In this case those spikes are caused by a hdparm -tT /dev/sda. In this
> particular instance the spike is no issue since its less than our task
> period.

Interesting.  Does that 2-second higher-latency window directly coincide
with the starting/stopping of the hdparm load?

> The problem arises when spikes that are particularly nasty make us go
> over the 1ms limit, resulting in an overrun. Here is an example:
>
> https://i.imgur.com/77sgj3S.png
>
> Till now we have only used tracing in our example application but we
> haven't been able to draw any conclusions. I'll try to obtain a trace
> of our main update cycle when one of these spikes happen.

This would be most helpful.  The first step will be to confirm the
assumption that nothing else is executing on the CPU with this RT task.

Also, keep in mind that tracing induces some overhead, so you might need
to adjust your threshold accordingly.  I've found that most of the
latency issues I've debugged can be via the irq, sched, and timer trace
events (maybe syscalls as well) so that's where I typically start.

It may also be worth a test with a later -rt kernel series like 4.14-rt
or even 4.16-rt to see if you can reproduce the issue there.

   Julia
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html