Re: RT-thread on cpu0 affects performance of RT-thread on isolated cpu1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Thank you for helping :)

>> I am an engineering student and I try to proof that a 4000Hz hard real-time
>> application can run on an ARM board rather than on a more powerful machine.
>> 
>> I work with an IMX6 dual-core and PREEMPT_RT patch-4.1.38-rt46.
>> I expected that my 4000Hz thread will perform better if it is the only one
>> on core1, so I put the boot argument isolcpus=1 and bound my thread to cpu1.
>> 
>> With the isolcpus=1, note that it remains these processes on core1:
>> 
>>    PID    PSR    RTPRIO    CMD
>>    16     1      99        [migration/1]
>>    17     1      -         [rcuc/1]
>>    18     1      1         [ktimersoftd/1]
>>    19     1      -         [ksoftirqd/1]
>>    20     1      99        [posixcputmr/1]
>>    21     1      -         [kworker/1:0]
>>    22     1      -         [kworker/1:0H]
>> 
>> I tried several permutations in my kernel configuration and boot args
>> (rcu_nocbs is an example) and none affected the results I describe below.
> 
> In general with isolcpus you should be able to get most task off CPU1.
> The rcu_nocbs option should get the RCU callbacks off CPU1. And then you
> can change the IRQ affinity to CPU0. But this looks good.

As I said, in my previous message, I deactivated such options as I was not
sure and as I did not observe any enhancement when activating them.
I agree it is what I want so I reactivated them and my kernel config looks like:

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
# CONFIG_NO_HZ_FULL_SYSIDLE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# RCU Subsystem
#
...
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
# CONFIG_RCU_NOCB_CPU_ZERO is not set
CONFIG_RCU_NOCB_CPU_ALL=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
# CONFIG_CPU_FREQ_STAT is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
...
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
...
CONFIG_ARM_IMX6Q_CPUFREQ=y
...

#
# CPU Idle
#
# CONFIG_CPU_IDLE is not set
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set

#
# Kernel Features
#
CONFIG_HZ_FIXED=0
CONFIG_HZ_100=y
...
CONFIG_HZ=100




>> I use a script to stress Linux. I expected that only cpu0 will be stressed
>> as cpu1 is isolated. But it has an impact on thread on cpu1 too.
>> I think it is normal.
> 
> The CPU-caches might be shared. Locks which which are held by CPU0 and
> required by CPU1 will also slow down CPU1.
> The sched_switch tracer should show you if anything of your "script to
> stress" migrates to CPU1 and/or if the "delays" CPU1. That would make
> your task on CPU1 go from state R to D and back to R.

Thank you, I never used Linux kernel tracing before. It was not enabled in
my Kernel, I just did it. I have some results below.

>> First, as I draw it (in red) on “expected_behavior.png”, I expected much less
>> variations in the Latency and especially the Execution time.
>> (My thread always does the same thing).
> 
> You could try cyclictest and compare its latency. However from the plot
> it looks at about 25us so it is not that bad.
> 
>> How can we explain so much time variations? As I said, I tried to deactivate
>> all interrupts on cpu1 (rcu and others processes above) but I am not very
>> familiar with that.
>>
> As I suggested above, run sched_switch, measure the latency and if you
> hit an execution time of 150us then tell your application to disable the
> trace (a write '0' to the tracing_on sysfs file would do it) and then
> you can look at the trace and see if got interrupted by anything.
> 
>> Then, I am even more surprised when, trying to debug that, I decided to put
>> another thread on core0 and it improved the behavior of the thread on core1!
> 
> If you put more load on the system and the numbers improve then this
> might be because the caches are filled with the "right" data. Also it
> might prevent power management from doing its job.

About power management, as you can see in the kernel conf, *it is enabled*
but *only the performance mode* is enabled.
Indeed, I already worked on that, I first deactivated it but then my clock
was not at its maximum.
I succeeded in getting the max clock frequency back by doing this.


>> My application looks like:

Thanks for reading the code :)
I already checked if next <= now, look how I did it in the updated code.
I changed to CLOCK_MONOTONIC, thanks.
I try to do error checking!

> …
>> thread1(){
>> 
>>      struct timespec start, stop, next, interval = 250us;
>> 
>>      /* Initialization of the periodicity */
>>      clock_gettime(CLOCK_MONOTONIC, &next);
>>
>>      /* as time basis, try not to use a random time but start with usec = 25 *
>>       * or 50 or so. You should be able to avoid the HZ timer.               */
>>      next.tv_sec += 1;
>>      next.tv_nsec = 250000;

Is it what you expected I did?

>>      next += interval;
>> 
>>      while(1){
>>           /*Releases at specified rate*/
>>           if(clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &next, NULL) != 0){
>>                perror("error");
>>                exit(-1);
>>           }
>>           /*Get time to check jitter and execution time*/
>>           if(clock_gettime(CLOCK_MONOTONIC, &start) != 0){
>>                perror("error");
>>                exit(-1);
>>           }
>>           do_job();
>>           /*Get time to check execution time*/
>>           if(clock_gettime(CLOCK_MONOTONIC, &stop) != 0){
>>                perror("error");
>>                exit(-1);
>>           }
>>           do_stat(); //jitter = start-next; exec_time = stop-start
>>           while(next <= stop){
>>                next += interval;
>>           }
>>      }
>> 
>> }
>> 
>> 
>> thread0(){
>>     struct timespec next, interval = 250us;
>> 
>>      /* Initialization of the periodicity */
>>      clock_gettime(CLOCK_REALTIME, &next);
>>      next += interval;
>> 
>>      while(1){
>>           /*Releases at specified rate*/
>>           clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL);
>>           usleep(100);
>>           /****************************************************************
>>            * Without sleeping 100us, only the Latency of the other thread *
>>            * (on cpu1) is improved.                                       *
>>            * Sleeping 100us in this new 4000Hz thread (cpu0) improved     *
>>            * the execution time of the other thread (on cpu1)...          *
>>            ****************************************************************/
> 
> I'm not sure but it looks like the usleep() is implemented as a
> busy loop. A busy loop of 100us isn't *that* bad but if all RT tasks
> take around 950us then the scheduler will prevent all RT tasks from
> running (in order to prevent a possible lockup).
> 
>>           next += interval;
>>      }
>> 
>> }
>> 
>> As you can see in “background_thread_on_core_0.png”, the Latency and the
>> Execution time (of the thread on core1) are improved (in comparison with
>> “no_background_thread.png”) when there is a new 4000Hz thread on cpu0
>> AND when this thread does something...
>> 
>> I tried a lot of permutations and I do not understand:
>> - If the new thread (cpu0) is at 5000Hz (>4000Hz), then observations
>>   are the same (performance of the thread on cpu1 improves)
>> - If the new thread is at 2000HZ (<4000Hz), then there is no improvement...
>> 
>> - If the new thread (4000Hz on cpu0) does something (even sleeping enough
>>   time), then the Execution time of the thread on cpu1 improves.
>> - If the new thread does nothing (or do too few stuff), then, ONLY the
>>   Latency of the thread on cpu1 is improved...
>> 
>> Do you have any experience with that, any idea to debug?
> 
> - tracing to see if the scheduler puts another task on while your RT is
>  running
> - tracing to see if an interrupts fires which prevents your task from
>  running. This should only be the timer interrupt since everything else
>  should be only on CPU0.

I analyse a trace below.
There is only the ktimersoftd/1 task which comes back every 10000us.
I first had CONFIG_HZ=1000 so it was every 1000us then.
I first thought it was responsible for my problems.
But finally it seems not. I am not sure.

> - check if any power management is on and try to disable it.

As I said above, it is *on* but *only perf mode is allowed*. Is it correct?

> - as time basis, try not to use a random time but start with usec = 25  
> or 50 or so. You should be able to avoid the HZ timer.

I tried to start with usec = 25. Is it correct in the code above?

Do you talk about the CONFIG_NO_HZ option?
I am confident it is what I want, but, according to the documentation,
I am afraid:
   "POSIX CPU timers prevent CPUs from entering adaptive-tick mode.
   Real-time applications needing to take actions based on CPU time
   consumption need to use other means of doing so."
Indeed, I want to continue to use the POSIX CPU timer (sleep and gettime).

What's more, this test from Frederic fails:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/frederic/dynticks-testing/+/9b39ab06956d484d5311e0678e1fba8104a8e2e3
According to this message, I can not stop tick:
user_loop-15296 [001] dn.....  8131.952509: tick_stop: success=no msg=more than 1 task in runqueue
Moreover, I do not see this message when tracing my application below.




I attached the trace (trace.txt) to this message.
The trace stopped when exec_time was more than 200us.

I "normalized" (put to 0) the timestamp two times to understand what
is going on.

What I understand:

Lines 5-8:   It is the clock_nanosleep handler.
             It takes 13us to wake up the thread. Good.
Line 9:      The thread took 60us to run. Very good.
Line 10:     Let's start again 250us later. Good.
Line 11:     tick_sched_timer... It will wake up the ktimersoftd/1.
             I am afraid :(
Line 15:     It tooks 35us to wake up the thread...
             Not so bad, sometimes it is more...
Line 16:     The thread took 68us to run. Ok.
Lines 17-19: ktimersoftd/1 took 17us to run (370-353).
             I think it is not so embarrassing. I am reassured :) 
...
Line 69:     1000us, ok
Line 74:     1256us, 6us late :(
Line 77:     36us to wake up the thread, +6us = 42us jitter :( :(
Line 78:     ***255us (1547-1292) execution time !!!***
Line 79:     Overrun, next rdv at 1750us, 8us late...

Does this trace help?
According to me, there is nothing on CPU1. There is just this ktimersoftd/1.
Can the idle task (on cpu0) be delayed by the stress on cpu0?
Can we migrate the Linux scheduler to cpu1?

Thanks again,

Yann
	.	
	.	
	.	
		
          <idle>-0     [001] d..h1..  	0	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	2	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864569754886
          <idle>-0     [001] dn.h1..  	7	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	13	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	73	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	250	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	253	: hrtimer_expire_entry: hrtimer=bf7af990 function=tick_sched_timer now=1864570005553
          <idle>-0     [001] d..h1..  	259	: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d..h1..  	263	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864570005553
          <idle>-0     [001] dn.h1..  	268	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	285	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	353	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=ktimersoftd/1 next_pid=21 next_prio=98
   ktimersoftd/1-21    [001] .....11  	361	: softirq_entry: vec=1 [action=TIMER]
   ktimersoftd/1-21    [001] .....11  	364	: softirq_exit: vec=1 [action=TIMER]
   ktimersoftd/1-21    [001] d...2..  	370	: sched_switch: prev_comm=ktimersoftd/1 prev_pid=21 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	501	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	505	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864570257220
          <idle>-0     [001] dn.h1..  	512	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	520	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	605	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	750	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	754	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864570506220
          <idle>-0     [001] dn.h1..  	760	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	767	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	841	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	1000	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	1004	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864570755220
          <idle>-0     [001] dn.h1..  	1010	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	1016	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	1088	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	1250	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	1254	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864571005886
          <idle>-0     [001] dn.h1..  	1259	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	1265	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
		
	.	
	.	
	.	
		
          <idle>-0     [001] d..h1..  	0	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	3	: hrtimer_expire_entry: hrtimer=bf7af990 function=tick_sched_timer now=1864580005886
          <idle>-0     [001] d..h1..  	10	: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d..h1..  	15	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864580005886
          <idle>-0     [001] dn.h1..  	21	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	41	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	123	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=ktimersoftd/1 next_pid=21 next_prio=98
   ktimersoftd/1-21    [001] .....11  	135	: softirq_entry: vec=1 [action=TIMER]
   ktimersoftd/1-21    [001] .....11  	138	: softirq_exit: vec=1 [action=TIMER]
   ktimersoftd/1-21    [001] d...2..  	147	: sched_switch: prev_comm=ktimersoftd/1 prev_pid=21 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	251	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	255	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864580256553
          <idle>-0     [001] dn.h1..  	260	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	267	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	359	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	500	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	504	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864580506220
          <idle>-0     [001] dn.h1..  	510	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	519	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	610	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	750	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	754	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864580755886
          <idle>-0     [001] dn.h1..  	760	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	769	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	856	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	1000	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	1004	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864581005886
          <idle>-0     [001] dn.h1..  	1010	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	1017	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	1110	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	1256	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	1265	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864581264220
          <idle>-0     [001] dn.h1..  	1277	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	1292	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1
    test_preempt-5583  [001] d...2..  	1547	: sched_switch: prev_comm=test_preempt prev_pid=5583 prev_prio=1 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0     [001] d..h1..  	1758	: irq_handler_entry: irq=16 name=twd
          <idle>-0     [001] d..h1..  	1770	: hrtimer_expire_entry: hrtimer=98117ef0 function=hrtimer_wakeup now=1864581767220
          <idle>-0     [001] dn.h1..  	1788	: irq_handler_exit: irq=16 ret=handled
          <idle>-0     [001] d...2..  	1808	: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=test_preempt next_pid=5583 next_prio=1

	exec_time > 200 : echo 0 > tracing_on
		

[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux