Hello Sebastian, all, I found another one of these in my logs: Jul 19 05:33:05 vuurmuur named[2363288]: dispatch 0x7f1e3d7fa6c0: shutting down due to TCP receive error: 2001:500:200::b#53: connection reset Jul 20 02:20:24 vuurmuur kernel: 002: rcu: INFO: rcu_preempt self-detected stall on CPU Jul 20 02:20:24 vuurmuur kernel: 002: rcu: 2-....: (5250 ticks this GP) idle=5c6/1/0x4000000000000002 softirq=401431806/401431806 fqs=2372 Jul 20 02:20:24 vuurmuur kernel: 002: (t=5250 jiffies g=701989125 q=336) Jul 20 02:20:24 vuurmuur kernel: 002: NMI backtrace for cpu 2 Jul 20 02:20:24 vuurmuur kernel: 002: CPU: 2 PID: 3468730 Comm: ntpd Tainted: G O 5.4.13-rt7 #9 Jul 20 02:20:24 vuurmuur kernel: 002: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./QC5000M-ITX/PH, BIOS P1.10 05/06/2015 Jul 20 02:20:24 vuurmuur kernel: 002: Call Trace: Jul 20 02:20:24 vuurmuur kernel: 002: <IRQ> Jul 20 02:20:24 vuurmuur kernel: 002: dump_stack+0x50/0x70 Jul 20 02:20:24 vuurmuur kernel: 002: nmi_cpu_backtrace.cold+0x14/0x53 Jul 20 02:20:24 vuurmuur kernel: 002: ? lapic_can_unplug_cpu.cold+0x3b/0x3b Jul 20 02:20:24 vuurmuur kernel: 002: nmi_trigger_cpumask_backtrace+0x8e/0xa2 Jul 20 02:20:24 vuurmuur kernel: 002: rcu_dump_cpu_stacks+0x8b/0xb9 Jul 20 02:20:24 vuurmuur kernel: 002: rcu_sched_clock_irq.cold+0x17e/0x4fd Jul 20 02:20:24 vuurmuur kernel: 002: ? account_system_index_time+0xa6/0xd0 Jul 20 02:20:24 vuurmuur kernel: 002: update_process_times+0x1f/0x50 Jul 20 02:20:24 vuurmuur kernel: 002: tick_sched_timer+0x5a/0x1c0 Jul 20 02:20:24 vuurmuur kernel: 002: ? tick_switch_to_oneshot.cold+0x74/0x74 Jul 20 02:20:24 vuurmuur kernel: 002: __hrtimer_run_queues+0xba/0x1b0 Jul 20 02:20:24 vuurmuur kernel: 002: hrtimer_interrupt+0x108/0x230 Jul 20 02:20:24 vuurmuur kernel: 002: smp_apic_timer_interrupt+0x61/0xa0 Jul 20 02:20:24 vuurmuur kernel: 002: apic_timer_interrupt+0xf/0x20 Jul 20 02:20:24 vuurmuur kernel: 002: </IRQ> Jul 20 02:20:24 vuurmuur kernel: 002: RIP: 0010:lock_release+0x114/0x1a0 Jul 20 02:20:24 vuurmuur kernel: 002: Code: 0a 50 6e 01 75 0e 44 2b 7c 24 08 44 39 bd a8 07 00 00 75 54 65 48 8b 04 25 00 5e 01 00 c7 80 ac 07 00 00 00 00 00 00 41 56 9d <48> 8b 44 24 10 65 48 33 04 25 28 00 00 00 75 73 48 8b 6c 24 18 4c Jul 20 02:20:24 vuurmuur kernel: 002: RSP: 0018:ffffc90001ebb8f0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jul 20 02:20:24 vuurmuur kernel: 002: RAX: ffff888054294140 RBX: 000000000000001f RCX: ffffc90001ebb8fc Jul 20 02:20:24 vuurmuur kernel: 002: RDX: 0000000000000000 RSI: ffffffff8203bee0 RDI: ffff8880542948f0 Jul 20 02:20:24 vuurmuur kernel: 002: RBP: ffff888054294140 R08: 0000000000020019 R09: 0000000000000000 Jul 20 02:20:24 vuurmuur kernel: 002: R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8203bee0 Jul 20 02:20:24 vuurmuur kernel: 002: R13: ffffffff8117b0a2 R14: 0000000000000246 R15: 0000000000000001 Jul 20 02:20:24 vuurmuur kernel: 002: ? do_select+0x132/0x7a0 Jul 20 02:20:24 vuurmuur kernel: 002: do_select+0x14f/0x7a0 Jul 20 02:20:24 vuurmuur kernel: 002: ? select_estimate_accuracy+0x100/0x100 Jul 20 02:20:24 vuurmuur kernel: 002: ? poll_select_finish+0x1d0/0x1d0 Jul 20 02:20:24 vuurmuur kernel: 002: ? poll_select_finish+0x1d0/0x1d0 Jul 20 02:20:24 vuurmuur kernel: 002: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: 002: ? put_prev_task_rt+0x22/0x140 Jul 20 02:20:24 vuurmuur kernel: 002: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: 002: ? __schedule+0x435/0x4f0 Jul 20 02:20:24 vuurmuur kernel: 002: ? _raw_spin_unlock_irq+0x1f/0x40 Jul 20 02:20:24 vuurmuur kernel: 002: ? __schedule+0x435/0x4f0 Jul 20 02:20:24 vuurmuur kernel: 002: ? preempt_schedule_irq+0x31/0x50 Jul 20 02:20:24 vuurmuur kernel: 002: ? retint_kernel+0x1b/0x1b Jul 20 02:20:24 vuurmuur kernel: 002: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: 002: ? core_sys_select+0x5c/0x380 Jul 20 02:20:24 vuurmuur kernel: 002: core_sys_select+0x1d0/0x380 Jul 20 02:20:24 vuurmuur kernel: 002: ? core_sys_select+0x5c/0x380 Jul 20 02:20:24 vuurmuur kernel: 002: ? tty_ldisc_ref_wait+0x27/0x70 Jul 20 02:20:24 vuurmuur kernel: 002: ? __ldsem_down_read_nested+0x5e/0x240 Jul 20 02:20:24 vuurmuur kernel: 002: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: 002: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: 002: ? set_user_sigmask+0x62/0x90 Jul 20 02:20:24 vuurmuur kernel: 002: __x64_sys_pselect6+0x141/0x190 Jul 20 02:20:24 vuurmuur kernel: 002: ? _raw_spin_unlock_irq+0x1f/0x40 Jul 20 02:20:24 vuurmuur kernel: 002: ? sigprocmask+0x6d/0x90 Jul 20 02:20:24 vuurmuur kernel: 002: do_syscall_64+0x77/0x440 Jul 20 02:20:24 vuurmuur kernel: 002: ? nmi_handle+0xc1/0xe0 Jul 20 02:20:24 vuurmuur kernel: 002: ? unregister_nmi_handler+0xb0/0xb0 Jul 20 02:20:24 vuurmuur kernel: 002: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 20 02:20:24 vuurmuur kernel: 002: RIP: 0033:0x7fba5ae5a096 Jul 20 02:20:24 vuurmuur kernel: 002: Code: e8 9f dd f8 ff 4c 8b 0c 24 4c 8b 44 24 08 89 c5 4c 8b 54 24 28 48 8b 54 24 20 b8 0e 01 00 00 48 8b 74 24 18 8b 7c 24 14 0f 05 <48> 3d 00 f0 ff ff 77 28 89 ef 89 04 24 e8 c8 dd f8 ff 8b 04 24 eb Jul 20 02:20:24 vuurmuur kernel: 002: RSP: 002b:00007ffcff164a10 EFLAGS: 00000293 ORIG_RAX: 000000000000010e Jul 20 02:20:24 vuurmuur kernel: 002: RAX: ffffffffffffffda RBX: 00005613a62cde78 RCX: 00007fba5ae5a096 Jul 20 02:20:24 vuurmuur kernel: 002: RDX: 0000000000000000 RSI: 00007ffcff164b00 RDI: 000000000000001f Jul 20 02:20:24 vuurmuur kernel: 002: RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcff164a50 Jul 20 02:20:24 vuurmuur kernel: 002: R10: 0000000000000000 R11: 0000000000000293 R12: 00005613a62a9c43 Jul 20 02:20:24 vuurmuur kernel: 002: R13: 0000000000000009 R14: ffffffffffffffff R15: 00005613a62a9e1c Jul 20 02:20:24 vuurmuur kernel: rcu: INFO: rcu_preempt self-detected stall on CPU Jul 20 02:20:24 vuurmuur kernel: rcu: #0112-....: (5250 ticks this GP) idle=5c6/1/0x4000000000000002 softirq=401431806/401431806 fqs=2372 Jul 20 02:20:24 vuurmuur kernel: #011(t=5250 jiffies g=701989125 q=336) Jul 20 02:20:24 vuurmuur kernel: NMI backtrace for cpu 2 Jul 20 02:20:24 vuurmuur kernel: CPU: 2 PID: 3468730 Comm: ntpd Tainted: G O 5.4.13-rt7 #9 Jul 20 02:20:24 vuurmuur kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./QC5000M-ITX/PH, BIOS P1.10 05/06/2015 Jul 20 02:20:24 vuurmuur kernel: Call Trace: Jul 20 02:20:24 vuurmuur kernel: <IRQ> Jul 20 02:20:24 vuurmuur kernel: dump_stack+0x50/0x70 Jul 20 02:20:24 vuurmuur kernel: nmi_cpu_backtrace.cold+0x14/0x53 Jul 20 02:20:24 vuurmuur kernel: ? lapic_can_unplug_cpu.cold+0x3b/0x3b Jul 20 02:20:24 vuurmuur kernel: nmi_trigger_cpumask_backtrace+0x8e/0xa2 Jul 20 02:20:24 vuurmuur kernel: rcu_dump_cpu_stacks+0x8b/0xb9 Jul 20 02:20:24 vuurmuur kernel: rcu_sched_clock_irq.cold+0x17e/0x4fd Jul 20 02:20:24 vuurmuur kernel: ? account_system_index_time+0xa6/0xd0 Jul 20 02:20:24 vuurmuur kernel: update_process_times+0x1f/0x50 Jul 20 02:20:24 vuurmuur kernel: tick_sched_timer+0x5a/0x1c0 Jul 20 02:20:24 vuurmuur kernel: ? tick_switch_to_oneshot.cold+0x74/0x74 Jul 20 02:20:24 vuurmuur kernel: __hrtimer_run_queues+0xba/0x1b0 Jul 20 02:20:24 vuurmuur kernel: hrtimer_interrupt+0x108/0x230 Jul 20 02:20:24 vuurmuur kernel: smp_apic_timer_interrupt+0x61/0xa0 Jul 20 02:20:24 vuurmuur kernel: apic_timer_interrupt+0xf/0x20 Jul 20 02:20:24 vuurmuur kernel: </IRQ> Jul 20 02:20:24 vuurmuur kernel: RIP: 0010:lock_release+0x114/0x1a0 Jul 20 02:20:24 vuurmuur kernel: Code: 0a 50 6e 01 75 0e 44 2b 7c 24 08 44 39 bd a8 07 00 00 75 54 65 48 8b 04 25 00 5e 01 00 c7 80 ac 07 00 00 00 00 00 00 41 56 9d <48> 8b 44 24 10 65 48 33 04 25 28 00 00 00 75 73 48 8b 6c 24 18 4c Jul 20 02:20:24 vuurmuur kernel: RSP: 0018:ffffc90001ebb8f0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jul 20 02:20:24 vuurmuur kernel: RAX: ffff888054294140 RBX: 000000000000001f RCX: ffffc90001ebb8fc Jul 20 02:20:24 vuurmuur kernel: RDX: 0000000000000000 RSI: ffffffff8203bee0 RDI: ffff8880542948f0 Jul 20 02:20:24 vuurmuur kernel: RBP: ffff888054294140 R08: 0000000000020019 R09: 0000000000000000 Jul 20 02:20:24 vuurmuur kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8203bee0 Jul 20 02:20:24 vuurmuur kernel: R13: ffffffff8117b0a2 R14: 0000000000000246 R15: 0000000000000001 Jul 20 02:20:24 vuurmuur kernel: ? do_select+0x132/0x7a0 Jul 20 02:20:24 vuurmuur kernel: do_select+0x14f/0x7a0 Jul 20 02:20:24 vuurmuur kernel: ? select_estimate_accuracy+0x100/0x100 Jul 20 02:20:24 vuurmuur kernel: ? poll_select_finish+0x1d0/0x1d0 Jul 20 02:20:24 vuurmuur kernel: ? poll_select_finish+0x1d0/0x1d0 Jul 20 02:20:24 vuurmuur kernel: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: ? put_prev_task_rt+0x22/0x140 Jul 20 02:20:24 vuurmuur kernel: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: ? __schedule+0x435/0x4f0 Jul 20 02:20:24 vuurmuur kernel: ? _raw_spin_unlock_irq+0x1f/0x40 Jul 20 02:20:24 vuurmuur kernel: ? __schedule+0x435/0x4f0 Jul 20 02:20:24 vuurmuur kernel: ? preempt_schedule_irq+0x31/0x50 Jul 20 02:20:24 vuurmuur kernel: ? retint_kernel+0x1b/0x1b Jul 20 02:20:24 vuurmuur kernel: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: ? core_sys_select+0x5c/0x380 Jul 20 02:20:24 vuurmuur kernel: core_sys_select+0x1d0/0x380 Jul 20 02:20:24 vuurmuur kernel: ? core_sys_select+0x5c/0x380 Jul 20 02:20:24 vuurmuur kernel: ? tty_ldisc_ref_wait+0x27/0x70 Jul 20 02:20:24 vuurmuur kernel: ? __ldsem_down_read_nested+0x5e/0x240 Jul 20 02:20:24 vuurmuur kernel: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: ? find_held_lock+0x2b/0x80 Jul 20 02:20:24 vuurmuur kernel: ? set_user_sigmask+0x62/0x90 Jul 20 02:20:24 vuurmuur kernel: __x64_sys_pselect6+0x141/0x190 Jul 20 02:20:24 vuurmuur kernel: ? _raw_spin_unlock_irq+0x1f/0x40 Jul 20 02:20:24 vuurmuur kernel: ? sigprocmask+0x6d/0x90 Jul 20 02:20:24 vuurmuur kernel: do_syscall_64+0x77/0x440 Jul 20 02:20:24 vuurmuur kernel: ? nmi_handle+0xc1/0xe0 Jul 20 02:20:24 vuurmuur kernel: ? unregister_nmi_handler+0xb0/0xb0 Jul 20 02:20:24 vuurmuur kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 20 02:20:24 vuurmuur kernel: RIP: 0033:0x7fba5ae5a096 Jul 20 02:20:24 vuurmuur kernel: Code: e8 9f dd f8 ff 4c 8b 0c 24 4c 8b 44 24 08 89 c5 4c 8b 54 24 28 48 8b 54 24 20 b8 0e 01 00 00 48 8b 74 24 18 8b 7c 24 14 0f 05 <48> 3d 00 f0 ff ff 77 28 89 ef 89 04 24 e8 c8 dd f8 ff 8b 04 24 eb Jul 20 02:20:24 vuurmuur kernel: RSP: 002b:00007ffcff164a10 EFLAGS: 00000293 ORIG_RAX: 000000000000010e Jul 20 02:20:24 vuurmuur kernel: RAX: ffffffffffffffda RBX: 00005613a62cde78 RCX: 00007fba5ae5a096 Jul 20 02:20:24 vuurmuur kernel: RDX: 0000000000000000 RSI: 00007ffcff164b00 RDI: 000000000000001f Jul 20 02:20:24 vuurmuur kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcff164a50 Jul 20 02:20:24 vuurmuur kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00005613a62a9c43 Jul 20 02:20:24 vuurmuur kernel: R13: 0000000000000009 R14: ffffffffffffffff R15: 00005613a62a9e1c Jul 20 05:03:01 vuurmuur named[2363288]: received control channel command 'flush' Again it appears to concern ntpd. Still on 5.4.13-rt7. Should I boot into a newer kernel to verify the issue is still there? Kind regards, Udo On 07-07-2020 19:47, Sebastian Andrzej Siewior wrote: > On 2020-07-04 05:43:32 [+0200], Udo van den Heuvel wrote: >>> I *think* this happened within the loop in __fget_files(). This function >>> is inlined by __fget_light() and the loop has a RCU-section so it would >>> make sense. >>> Do you run something at an elevated priority in the system? >> >> ntpd (ntpsec) elevates itself. > > To the highest possible. Wonderful. > >>> I don't know >>> what the other part was doing but somehow one of the file descriptors >>> (network sockets probably) was about to be closed while the other side >>> tried to poll() on it. >> >> Thanks for explaining. > > I'm puzzled. On dup() and otherwise on replacement / closing the file > pointer is replaced (with an incremented reference count or NULL ) and > then the reference count of the old file pointer is decremented. So the > zero count could be observed once and then the new pointer should be > seen. So it should not loop for seconds. And then, dup2() is only used > at startup.