[PATCH 0/1] nohz_full state entry failure in preempt_rt and proposed fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Ramesh Thomas <ramesh.thomas@xxxxxxxxx>

Hello,

This addresses an issue we have been facing with preempt_rt kernels not
able to enter nohz_full state consistently. Following are my debug
findings and details of a tool I had devloped that can help reproduce
the issue. Following patch has a proposed fix or at least a pointer to
areas worth looking into. We need preempt_rt for determinism and being
able to use nohz_full along with it is very valuable.

Problem:
Sometimes nohz_full state is never entered even when all necessary
conditions are met. It is easier to reproduce the issue in preempt_rt
kernel, however it may not be limited to preempt_rt.

Debug findings and proposed fix:
Observed that in the failure condition, entry into nohz_full state is
repeatedly aborted due to the detection of a pending timer event in the
next period in tick_nohz_next_event(). The issue is not reproduceable if
tick stoppage is not bailed out here. The skipping of the bailing out
code is done only if CONFIG_NO_HZ_FULL is defined. Since in nohz_full
mode, idle state is not entered when ticks are being stopped, aborting
tick stoppage may not be necessary. It is simpler to let the common code
that handles reprogramming of the timer at tick_nohz_stop_tick() handle
the next tick. 

Environment to reproduce:
Used Intel NUCs with 4 cores (Apollo Lake and Tiger Lake). Easier to
reproduce in embedded platforms. 

Kernel version:5.10.1-rt19 (This is not a new issue and I have seen it
in rt kernel version 4.17)

Relevant kernel config flags:
- NO_HZ_FULL=y
- PREEMPT_RT=y
- CPU_ISOLATION=y
- RCU_NOCB_CPU=y

Relevant kernel boot parameters:
- isolcpus=nohz,domain,1,3 nohz_full=1,3 rcu_nocbs=1,3 irqaffinity=0
- cpufreq.off=1 idle=poll cpuidle.off=1

Steps to reproduce:
1. Disable rt throttling assigning 100% scheduler period to rt tasks
2. Set cpu affinity to one of the nohz_full cpus
3. Set scheduling policy to sched_fifo with max priority
4. Wait till "tick_stopped" gets set in /proc/timer_list
5. Return failure if tick is not stopped in 15 seconds
6. Run above steps in a loop to stress it. It may take a while to
reproduce.

The above can be done using a tool I had developed as part of a
framework to assist setting up CPU thread isolation and measuring
jitter. It can be found at https://github.com/intel/tif

Build the tif_test app and run as follows using the included script to
run it in a loop till the failure is reproduced. 

make test
./tif_stress.sh

e.g. output. 
"Test# 818
Successfully entered nohz state in 724us

Error entering nohz state after 15000102us
Reproduced NOHZ_FULL failure after 818 tries!!!
Test elapsed time: 835 seconds"

(PS: The framework has a workaround for the issue which is not used in
the test. The workaround that helped was, switching CPU affinity in and
out of the nohz_full CPU giving the scheduler a fresh start)


Ramesh Thomas (1):
  dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt

 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.26.2




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux