Re: Non RT threads impact on RT thread

Jordan Palacios <jordan.palacios@xxxxxxxxxxxxxxxx> · Wed, 23 May 2018 18:19:31 +0200

On 23 May 2018 at 18:07, Julia Cartwright <julia@xxxxxx> wrote:
> On Wed, May 23, 2018 at 05:43:57PM +0200, Jordan Palacios wrote:
>> Hello,
>>
>> Thanks for the answers.
>>
>> We don't have any nvidia card installed on the system.
>>
>> We'll try the isolcpus in conjunct with our cpuset setup and we'll
>> look into the irq_smp_affinity.
>
> Given the spike magnitudes you are seeing, I doubt they are task
> migration related; meaning I don't think that isolcpus will make a
> difference.
>
>> These are some of the specs of the system. Let me know if you need
>> something else that might be relevant.
>>
>> Active module: Congatec conga-TS77/i7-3612QE
>> Carrier: Connect Tech CCG008
>> DDR3L-SODIMM-1600 (8GB)
>> Crucial MX200 250GB mSATA SSD
>>
>> I have uploaded one graph with an example of our issue here:
>>
>> https://i.imgur.com/8KoxzNV.png
>>
>> In blue the time between cycles and in green the execution time of
>> each loop. X is in seconds and Y in microseconds. As you can see the
>> execution time is quite constant until we run some intensive IO tasks.
>> In this case those spikes are caused by a hdparm -tT /dev/sda. In this
>> particular instance the spike is no issue since its less than our task
>> period.
>
> Interesting.  Does that 2-second higher-latency window directly coincide
> with the starting/stopping of the hdparm load?

Yes. It coincides with the part that tests cache reads to be more precise.

>> The problem arises when spikes that are particularly nasty make us go
>> over the 1ms limit, resulting in an overrun. Here is an example:
>>
>> https://i.imgur.com/77sgj3S.png
>>
>> Till now we have only used tracing in our example application but we
>> haven't been able to draw any conclusions. I'll try to obtain a trace
>> of our main update cycle when one of these spikes happen.
>
> This would be most helpful.  The first step will be to confirm the
> assumption that nothing else is executing on the CPU with this RT task.
>
> Also, keep in mind that tracing induces some overhead, so you might need
> to adjust your threshold accordingly.  I've found that most of the
> latency issues I've debugged can be via the irq, sched, and timer trace
> events (maybe syscalls as well) so that's where I typically start.
>
> It may also be worth a test with a later -rt kernel series like 4.14-rt
> or even 4.16-rt to see if you can reproduce the issue there.
>
>    Julia

Thanks Julia. I'll look into it and report back.

Kind regards.

Jordan.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html