RT-thread on cpu0 affects performance of RT-thread on isolated cpu1

Yann le Chevoir <yann.le-chevoir@xxxxxxxxxxxxxxxxxxxxxxxxxx> · Wed, 28 Feb 2018 22:11:10 +0100 (CET)

Hello,

I am an engineering student and I try to proof that a 4000Hz hard real-time
application can run on an ARM board rather than on a more powerful machine.

I work with an IMX6 dual-core and PREEMPT_RT patch-4.1.38-rt46.
I expected that my 4000Hz thread will perform better if it is the only one
on core1, so I put the boot argument isolcpus=1 and bound my thread to cpu1.

With the isolcpus=1, note that it remains these processes on core1:

   PID    PSR    RTPRIO    CMD
   16     1      99        [migration/1]
   17     1      -         [rcuc/1]
   18     1      1         [ktimersoftd/1]
   19     1      -         [ksoftirqd/1]
   20     1      99        [posixcputmr/1]
   21     1      -         [kworker/1:0]
   22     1      -         [kworker/1:0H]

I tried several permutations in my kernel configuration and boot args
(rcu_nocbs is an example) and none affected the results I describe below.

I use a script to stress Linux. I expected that only cpu0 will be stressed
as cpu1 is isolated. But it has an impact on thread on cpu1 too.
I think it is normal.

First, as I draw it (in red) on “expected_behavior.png”, I expected much less
variations in the Latency and especially the Execution time.
(My thread always does the same thing).

How can we explain so much time variations? As I said, I tried to deactivate
all interrupts on cpu1 (rcu and others processes above) but I am not very
familiar with that.

Then, I am even more surprised when, trying to debug that, I decided to put
another thread on core0 and it improved the behavior of the thread on core1!

My application looks like:

main(){

     create a first 4000Hz thread (thread1), prio = 99, cpu = 1
     /*cpu1 is isolated*/

     create a second 4000Hz thread (thread0), prio = 98, cpu = 0
     /*To create this thread (cpu0) improves the performance of
      *the other thread (cpu1)!*/

     start both threads

     while(1){
          print_stat();
     }

}

thread1(){

     struct timespec start, stop, next, interval = 250us;

     /* Initialization of the periodicity */
     clock_gettime(CLOCK_REALTIME, &next);
     next += interval;

     while(1){
          /*Releases at specified rate*/
          clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL);
          /*Get time to check jitter and execution time*/
          clock_gettime(CLOCK_REALTIME, &start);
          do_job();
          /*Get time to check execution time*/
          clock_gettime(CLOCK_REALTIME, &stop);
          do_stat(); //jitter = start-next; exec_time = stop-start
          next += interval;
     }

}

thread0(){
    struct timespec next, interval = 250us;

     /* Initialization of the periodicity */
     clock_gettime(CLOCK_REALTIME, &next);
     next += interval;

     while(1){
          /*Releases at specified rate*/
          clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL);
          usleep(100);
          /****************************************************************
           * Without sleeping 100us, only the Latency of the other thread *
           * (on cpu1) is improved.                                       *
           * Sleeping 100us in this new 4000Hz thread (cpu0) improved     *
           * the execution time of the other thread (on cpu1)...          *
           ****************************************************************/
          next += interval;
     }

}

As you can see in “background_thread_on_core_0.png”, the Latency and the
Execution time (of the thread on core1) are improved (in comparison with
“no_background_thread.png”) when there is a new 4000Hz thread on cpu0
AND when this thread does something...

I tried a lot of permutations and I do not understand:
- If the new thread (cpu0) is at 5000Hz (>4000Hz), then observations
  are the same (performance of the thread on cpu1 improves)
- If the new thread is at 2000HZ (<4000Hz), then there is no improvement...

- If the new thread (4000Hz on cpu0) does something (even sleeping enough
  time), then the Execution time of the thread on cpu1 improves.
- If the new thread does nothing (or do too few stuff), then, ONLY the
  Latency of the thread on cpu1 is improved...

Do you have any experience with that, any idea to debug?
I wonder if the scheduler or the clock tick are bound to cpu0 and if it
can play a role in the responsiveness of the thread on cpu1 (isolated one).

Thanks,

Regards,

Yann
Attachment:
background_thread_on_core_0.png

Description: PNG image
Attachment:
expected_behavior.png

Description: PNG image
Attachment:
no_background_thread.png

Description: PNG image