Hi Song, On Thu, Jan 28, 2021 at 10:12 AM chensong_2000@xxxxxx <chensong_2000@xxxxxx> wrote: > Dear experts, > > When i was running cyclictest with ltp(kernel 4.19.90 rt, arch64), > cyclictest got a big latecy(more than 10ms), the ftrace log shows that > modprobe called preempt_disable for a long time, which stops cyclictest > being switched. This works as designed. Running a latency sensitive application has some inherent rules. You need to stick to that rules, otherwise excessive latencies are to be expected. The best documentation (I believe) you can find on https://wiki.linuxfoundation.org/realtime/start (ex. https://rt.wiki.kernel.org/). In short, running modprobe (loading kernel modules in general) is not allowed. For a production system, you are supposed to set up the machine first (loading the modules during boot or before starting the application). The application itself is bound to allowed operations only, otherwise it has to expect blocking and unbounded latencies. > here is a piece of log: > > modprobe-4551 0...10 712.754529: preempt_disable: > caller=0xffff0001019900c8 parent=0xffff0001019900c8 > ... > modprobe-4551 0d..10 712.754589: irq_disable: caller=el1_irq+0x7c > parent=0xffff0001019900dc > modprobe-4551 0d.h10 712.754590: irq_handler_entry: irq=2 > name=arch_timer > modprobe-4551 0d.h20 712.754591: hrtimer_cancel: > hrtimer=0xffff8020f4983d98 > modprobe-4551 0d.h10 712.754591: hrtimer_expire_entry: > hrtimer=0xffff8020f4983d98 now=712720218245 function=hrtimer_wakeup/0x0 > modprobe-4551 0d.h20 712.754592: sched_waking: comm=cyclictest > pid=2212 prio=9 target_cpu=000 > modprobe-4551 0dNh30 712.754593: sched_wakeup: cyclictest:2212 [9] > success=1 CPU:000 > modprobe-4551 0dNh10 712.754594: hrtimer_expire_exit: > hrtimer=0xffff8020f4983d98 > modprobe-4551 0dNh10 712.754594: irq_handler_exit: irq=2 > ret=handled Again, this is perfectly valid. > Further i found > preemptirq_delay_test(kernel/trace/preemptirq_delay_test.c) can > reproduce it easily and got the similar ftrace log. You can find endless ways how to introduce latency to the system. That's why some operations are not allowed. For more details see the mentioned documentation. ^^ > The root cause i found is in el1_irq, it checks preempt count and > TIF_NEED_RESCHED before it goes through the path el1_preempt, > preempt_disable just stops it happening. As it should. > Then i came up an idea and did an experiment, call > preempt_count_set(0); and set_tsk_need_resched(task); in hrtimer_wakeup > to meet the expectation in order to reschdule. Sounds like demolishing the pillars of the bridge, yet still expecting it not to fall. This will break your system for sure. > static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer) > { > struct hrtimer_sleeper *t = > container_of(timer, struct hrtimer_sleeper, timer); > struct task_struct *task = t->task; > > t->task = NULL; > if (task) > wake_up_process(task); > > if(preempt_count()) > preempt_count_set(0); > set_tsk_need_resched(task); > > return HRTIMER_NORESTART; > } > > but, failed, system froze, no panic information. Not surprised. > Is there anyone having the same problem? i would appreciate it if you > could share me some information in this case, many thanks. Simply, no modprobe while running the cyclictest. The cyclictest is supposed to be running on a machine as if the machine was in production and definitely not tortured by arbitrary synthetic tests (unless the test simulates a real world scenario which actually can happen in production). Again, all the modules should be preloaded before running cyclictest or the production application. Realtime system is a fragile beast and you have to handle it with special care. Have a nice day, Daniel > BR > > Song Chen