Hello, I am an engineering student and I try to proof that a 4000Hz hard real-time application can run on an ARM board rather than on a more powerful machine. I work with an IMX6 dual-core and PREEMPT_RT patch-4.1.38-rt46. I expected that my 4000Hz thread will perform better if it is the only one on core1, so I put the boot argument isolcpus=1 and bound my thread to cpu1. With the isolcpus=1, note that it remains these processes on core1: PID PSR RTPRIO CMD 16 1 99 [migration/1] 17 1 - [rcuc/1] 18 1 1 [ktimersoftd/1] 19 1 - [ksoftirqd/1] 20 1 99 [posixcputmr/1] 21 1 - [kworker/1:0] 22 1 - [kworker/1:0H] I tried several permutations in my kernel configuration and boot args (rcu_nocbs is an example) and none affected the results I describe below. I use a script to stress Linux. I expected that only cpu0 will be stressed as cpu1 is isolated. But it has an impact on thread on cpu1 too. I think it is normal. First, as I draw it (in red) on “expected_behavior.png”, I expected much less variations in the Latency and especially the Execution time. (My thread always does the same thing). How can we explain so much time variations? As I said, I tried to deactivate all interrupts on cpu1 (rcu and others processes above) but I am not very familiar with that. Then, I am even more surprised when, trying to debug that, I decided to put another thread on core0 and it improved the behavior of the thread on core1! My application looks like: main(){ create a first 4000Hz thread (thread1), prio = 99, cpu = 1 /*cpu1 is isolated*/ create a second 4000Hz thread (thread0), prio = 98, cpu = 0 /*To create this thread (cpu0) improves the performance of *the other thread (cpu1)!*/ start both threads while(1){ print_stat(); } } thread1(){ struct timespec start, stop, next, interval = 250us; /* Initialization of the periodicity */ clock_gettime(CLOCK_REALTIME, &next); next += interval; while(1){ /*Releases at specified rate*/ clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL); /*Get time to check jitter and execution time*/ clock_gettime(CLOCK_REALTIME, &start); do_job(); /*Get time to check execution time*/ clock_gettime(CLOCK_REALTIME, &stop); do_stat(); //jitter = start-next; exec_time = stop-start next += interval; } } thread0(){ struct timespec next, interval = 250us; /* Initialization of the periodicity */ clock_gettime(CLOCK_REALTIME, &next); next += interval; while(1){ /*Releases at specified rate*/ clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL); usleep(100); /**************************************************************** * Without sleeping 100us, only the Latency of the other thread * * (on cpu1) is improved. * * Sleeping 100us in this new 4000Hz thread (cpu0) improved * * the execution time of the other thread (on cpu1)... * ****************************************************************/ next += interval; } } As you can see in “background_thread_on_core_0.png”, the Latency and the Execution time (of the thread on core1) are improved (in comparison with “no_background_thread.png”) when there is a new 4000Hz thread on cpu0 AND when this thread does something... I tried a lot of permutations and I do not understand: - If the new thread (cpu0) is at 5000Hz (>4000Hz), then observations are the same (performance of the thread on cpu1 improves) - If the new thread is at 2000HZ (<4000Hz), then there is no improvement... - If the new thread (4000Hz on cpu0) does something (even sleeping enough time), then the Execution time of the thread on cpu1 improves. - If the new thread does nothing (or do too few stuff), then, ONLY the Latency of the thread on cpu1 is improved... Do you have any experience with that, any idea to debug? I wonder if the scheduler or the clock tick are bound to cpu0 and if it can play a role in the responsiveness of the thread on cpu1 (isolated one). Thanks, Regards, Yann
Attachment:
background_thread_on_core_0.png
Description: PNG image
Attachment:
expected_behavior.png
Description: PNG image
Attachment:
no_background_thread.png
Description: PNG image