Hello, * While running oslat(8) tests in a guest running on RHEL kernel-rt host, I'm seeing occasional peculiar latency spikes as below === # trace-cmd record -p nop --poll -M 1E -e all /home/test/rt-tests/oslat --cpu-list 1-4 --rtprio 1 --duration 12h -T 30 oslat V 2.30 INFO: debugfs mountpoint: /sys/kernel/debug/tracing/ Total runtime: 43200 seconds Thread priority: SCHED_FIFO:1 CPU list: 1-4 CPU for main thread: 0 Workload: no Workload mem: 0 (KiB) Preheat cores: 4 Pre-heat for 1 seconds... Test starts... /home/test/rt-tests/oslat: Trace threshold (30 us) triggered on cpu 2 with 67 us! Stopping the test. ... <idle>-0 [002] 265.041487: tlb_flush: pages=0 reason=flush on task switch (0) oslat-16074 [002] 265.041489: print: tracing_mark_write: /home/test/rt-tests/oslat: Starting thread on CPU 2. oslat-16074 [002] 265.041490: kfree: (__audit_syscall_exit+0x249) call_site=ffffffff95fb8889 ptr=(nil) oslat-16074 [002] 265.041490: sys_exit: NR 1 = 53 oslat-16074 [002] 265.041490: sys_exit_write: 0x35 oslat-16074 [002] 265.041490: user_enter: oslat-16074 [002] 300.576710: user_exit: oslat-16074 [002] 300.576734: irq_handler_entry: irq=21 name=virtio3 oslat-16074 [002] 300.576734: irq_handler_exit: irq=21 ret=handled oslat-16074 [002] 300.576734: sched_waking: comm=irq/21-virtio3 pid=219 prio=49 target_cpu=002 oslat-16074 [002] 300.576736: sched_migrate_task: comm=irq/21-virtio3 pid=219 prio=49 orig_cpu=2 dest_cpu=0 oslat-16074 [002] 300.576740: write_msr: 830, value fd oslat-16074 [002] 300.576740: sched_wakeup: irq/21-virtio3:219 [49] success=1 CPU:000 oslat-16074 [002] 300.576742: read_msr: 819, value 4 oslat-16074 [002] 300.576754: irq_matrix_alloc: bit=38 cpu=0 online=1 avl=195 alloc=6 managed=1 online_maps=6 global_avl=1178, global_rsvd=10, total_alloc=28 oslat-16074 [002] 300.576755: vector_update: irq=21 vector=38 cpu=0 prev_vector=34 prev_cpu=2 oslat-16074 [002] 300.576755: vector_alloc: irq=21 vector=38 reserved=0 ret=0 oslat-16074 [002] 300.576756: vector_config: irq=21 vector=38 cpu=0 apicdest=0x00000000 oslat-16074 [002] 300.576773: user_enter: oslat-16074 [002] 300.576778: user_exit: oslat-16074 [002] 300.576778: sys_enter: NR 1 (3, 7fcbb683c300, 65, 0, 0, 7fcbb683b816) oslat-16074 [002] 300.576778: sys_enter_write: fd: 0x00000003, buf: 0x7fcbb683c300, count: 0x00000065 oslat-16074 [002] 300.576780: print: tracing_mark_write: /home/test/rt-tests/oslat: Trace threshold (30 us) triggered on cpu 2 with 67 us! Stopping the test. oslat-16074 [002] 300.576780: kfree: (__audit_syscall_exit+0x249) call_site=ffffffff95fb8889 ptr=(nil) oslat-16074 [002] 300.576781: sys_exit: NR 1 = 101 oslat-16074 [002] 300.576781: sys_exit_write: 0x65 oslat-16074 [002] 300.576781: user_enter: === * Looking at these traces it is difficult to see/conclude the source of the said (67us) latency spike. * Other logs I have for the same host/guest show _call_function_entry/exit or timer softirq entries in traces. * Peculiar is that these spikes seem to occur when oslat(8) is invoked after the guest boot-up. Subsequent oslat(8) sessions in the same guest run over 12 hrs without any sign of latency spike. * Is there a way to find further clues or conclusively figure out the source of such occasional latency spikes? Thank you. --- - P J P