Hi, is there anything I can do to raise interest on this one? :) > On 19. Jun 2023, at 10:29, bugzilla-daemon@xxxxxxxxxx wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=217572 > > Bug ID: 217572 > Summary: Initial blocked tasks causing deterioration over hours > until (nearly) complete system lockup and data loss > with PostgreSQL 13 > Product: File System > Version: 2.5 > Hardware: All > OS: Linux > Status: NEW > Severity: high > Priority: P3 > Component: XFS > Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx > Reporter: ct@xxxxxxxxxxxxxxx > Regression: No > > Last Friday we experienced the following hung task messages with PostgreSQL > while performing our nightly backup using pg_dump. Normally this takes at most > a few minutes with IO being stressed. This time it caused high SYS CPU time, > went on for almost 20 minutes and caused the PostgreSQL dump to fail with > inconsistent data. > > Around 3:50 AM we got this: > > [330289.821046] INFO: task .postgres-wrapp:11884 blocked for more than 122 > seconds. > [330289.821830] Not tainted 6.1.31 #1-NixOS > [330289.822285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [330289.823098] task:.postgres-wrapp state:D stack:0 pid:11884 ppid:11881 > flags:0x00000002 > [330289.823911] Call Trace: > [330289.824221] <TASK> > [330289.824451] __schedule+0x35d/0x1370 > [330289.824858] ? mntput_no_expire+0x4a/0x250 > [330289.825307] schedule+0x5d/0xe0 > [330289.825630] rwsem_down_write_slowpath+0x34e/0x730 > [330289.826128] xfs_ilock+0xeb/0xf0 [xfs] > [330289.826599] xfs_file_buffered_write+0x119/0x300 [xfs] > [330289.827212] ? selinux_file_permission+0x10b/0x150 > [330289.827683] vfs_write+0x244/0x400 > [330289.828049] __x64_sys_pwrite64+0x94/0xc0 > [330289.828459] do_syscall_64+0x3a/0x90 > [330289.828801] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [330289.829302] RIP: 0033:0x7ff8de90e7f7 > [330289.829681] RSP: 002b:00007fff52069b08 EFLAGS: 00000202 ORIG_RAX: > 0000000000000012 > [330289.830408] RAX: ffffffffffffffda RBX: 0000562bb434f510 RCX: > 00007ff8de90e7f7 > [330289.831073] RDX: 0000000000002000 RSI: 00007ff888b52e80 RDI: > 000000000000003b > [330289.831762] RBP: 00000000000021b0 R08: 000000000a000010 R09: > 0000000000000040 > [330289.832440] R10: 0000000004ed8000 R11: 0000000000000202 R12: > 0000000000002000 > [330289.833130] R13: 0000000004ed8000 R14: 00007ff8de8176c8 R15: > 0000562bb434af75 > [330289.833803] </TASK> > [330289.834064] INFO: task .postgres-wrapp:1245532 blocked for more than 122 > seconds. > [330289.834771] Not tainted 6.1.31 #1-NixOS > [330289.835209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [330289.835926] task:.postgres-wrapp state:D stack:0 pid:1245532 ppid:11881 > flags:0x00000002 > [330289.836752] Call Trace: > [330289.837010] <TASK> > [330289.837258] __schedule+0x35d/0x1370 > [330289.837622] ? page_add_file_rmap+0xba/0x2f0 > [330289.838047] ? do_set_pte+0x174/0x1c0 > [330289.838420] ? unix_stream_read_generic+0x223/0xa60 > [330289.838887] schedule+0x5d/0xe0 > [330289.839265] schedule_preempt_disabled+0x14/0x30 > [330289.839758] rwsem_down_read_slowpath+0x29e/0x4f0 > [330289.840521] down_read+0x47/0xb0 > [330289.840853] xfs_ilock+0x79/0xf0 [xfs] > [330289.841346] xfs_file_buffered_read+0x44/0xd0 [xfs] > [330289.841945] xfs_file_read_iter+0x6a/0xd0 [xfs] > [330289.842496] vfs_read+0x23c/0x310 > [330289.842845] __x64_sys_pread64+0x94/0xc0 > [330289.843303] do_syscall_64+0x3a/0x90 > [330289.843857] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [330289.844420] RIP: 0033:0x7ff8de90e747 > [330289.844898] RSP: 002b:00007fff5206aae8 EFLAGS: 00000202 ORIG_RAX: > 0000000000000011 > [330289.845705] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007ff8de90e747 > [330289.846488] RDX: 0000000000002000 RSI: 00007ff8896e0e80 RDI: > 0000000000000010 > [330289.847133] RBP: 0000000000000001 R08: 000000000a00000d R09: > 0000000000000000 > [330289.847627] R10: 0000000018fb8000 R11: 0000000000000202 R12: > 00007ff8cdd91278 > [330289.848113] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: > 0000562bb5e89e70 > [330289.848631] </TASK> > > > This keeps going on for a few minutes until: > > [330584.618978] rcu: INFO: rcu_preempt self-detected stall on CPU > [330584.619413] rcu: 1-....: (20999 ticks this GP) > idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=5231 > [330584.620052] (t=21000 jiffies g=142859597 q=21168 ncpus=3) > [330584.620409] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31 > #1-NixOS > [330584.620880] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 > [330584.621588] RIP: 0010:xas_load+0xb/0x40 > [330584.621844] Code: 04 48 8b 44 c2 08 c3 cc cc cc cc 48 8b 07 48 8b 40 08 c3 > cc cc cc cc 0f 1f 84 00 00 00 00 00 e8 3b ff ff ff 48 89 c2 83 e2 03 <48> 83 fa > 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f > [330584.622996] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000202 > [330584.623337] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX: > 0000000000000002 > [330584.623787] RDX: 0000000000000002 RSI: ffff98871f9d2920 RDI: > ffffb427c3387c00 > [330584.624239] RBP: 000000000000f161 R08: ffffb427c3387e68 R09: > ffffda1444482000 > [330584.624686] R10: 0000000000000001 R11: 0000000000000001 R12: > 000000000000f161 > [330584.625136] R13: ffff9887ad3c2700 R14: 000000000000f160 R15: > ffffb427c3387e90 > [330584.625591] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000) > knlGS:0000000000000000 > [330584.626097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [330584.626467] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4: > 00000000000006e0 > [330584.626954] Call Trace: > [330584.627125] <IRQ> > [330584.627281] ? rcu_dump_cpu_stacks+0xc8/0x100 > [330584.627567] ? rcu_sched_clock_irq.cold+0x15b/0x2fb > [330584.627882] ? sched_slice+0x87/0x140 > [330584.628126] ? timekeeping_update+0xdd/0x130 > [330584.628414] ? __cgroup_account_cputime_field+0x5b/0xa0 > [330584.628751] ? update_process_times+0x77/0xb0 > [330584.629036] ? update_wall_time+0xc/0x20 > [330584.629300] ? tick_sched_handle+0x34/0x50 > [330584.629564] ? tick_sched_timer+0x6f/0x80 > [330584.629823] ? tick_sched_do_timer+0xa0/0xa0 > [330584.630103] ? __hrtimer_run_queues+0x112/0x2b0 > [330584.630404] ? hrtimer_interrupt+0xfe/0x220 > [330584.630678] ? __sysvec_apic_timer_interrupt+0x7f/0x170 > [330584.631016] ? sysvec_apic_timer_interrupt+0x99/0xc0 > [330584.631339] </IRQ> > [330584.631485] <TASK> > [330584.631631] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [330584.631974] ? xas_load+0xb/0x40 > [330584.632195] ? xas_load+0x30/0x40 > [330584.632415] filemap_get_read_batch+0x16e/0x250 > [330584.632710] filemap_get_pages+0xa9/0x630 > [330584.632972] ? memcg_check_events+0xda/0x210 > [330584.633259] ? free_unref_page_commit+0x7c/0x170 > [330584.633560] ? _raw_spin_unlock_irqrestore+0x23/0x40 > [330584.633880] ? free_unref_page+0x1ac/0x220 > [330584.634146] filemap_read+0xd2/0x340 > [330584.634389] xfs_file_buffered_read+0x4f/0xd0 [xfs] > [330584.634778] xfs_file_read_iter+0x6a/0xd0 [xfs] > [330584.635123] vfs_read+0x23c/0x310 > [330584.635354] __x64_sys_pread64+0x94/0xc0 > [330584.635609] do_syscall_64+0x3a/0x90 > [330584.635846] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [330584.636174] RIP: 0033:0x7ff8de90e747 > [330584.636437] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 > f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 > f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 > [330584.637582] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX: > 0000000000000011 > [330584.638056] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007ff8de90e747 > [330584.638506] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI: > 0000000000000230 > [330584.638954] RBP: 0000000000000001 R08: 000000000a00000d R09: > 0000000000000000 > [330584.639411] R10: 000000000f160000 R11: 0000000000000202 R12: > 00007ff8cdd92688 > [330584.639862] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: > 0000562bb5fe04d0 > [330584.640313] </TASK> > [330584.721995] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: > { 1-.... } 21015 jiffies s: 7297 root: 0x2/. > [330584.722930] rcu: blocking rcu_node structures (internal RCU debug): > [330584.723428] Sending NMI from CPU 0 to CPUs 1: > [330584.723763] NMI backtrace for cpu 1 > [330584.723769] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31 > #1-NixOS > [330584.723772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 > [330584.723775] RIP: 0010:xas_load+0x29/0x40 > [330584.723783] Code: 00 e8 3b ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 > 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f 10 48 8d 70 fe 38 48 fe <72> ee e8 > 20 fe ff ff 80 3e 00 75 d0 c3 cc cc cc cc 66 0f 1f 44 00 > [330584.723785] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000246 > [330584.723787] RAX: ffff98871f8dbdaa RBX: ffffb427c3387d70 RCX: > 0000000000000000 > [330584.723788] RDX: 0000000000000002 RSI: ffff98871f8dbda8 RDI: > ffffb427c3387c00 > [330584.723789] RBP: 000000000000f161 R08: ffffb427c3387e68 R09: > ffffda1444482000 > [330584.723790] R10: 0000000000000001 R11: 0000000000000001 R12: > 000000000000f161 > [330584.723791] R13: ffff9887ad3c2700 R14: 000000000000f160 R15: > ffffb427c3387e90 > [330584.723793] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000) > knlGS:0000000000000000 > [330584.723794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [330584.723795] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4: > 00000000000006e0 > [330584.723798] Call Trace: > [330584.723801] <NMI> > [330584.723805] ? nmi_cpu_backtrace.cold+0x1b/0x76 > [330584.723810] ? nmi_cpu_backtrace_handler+0xd/0x20 > [330584.723814] ? nmi_handle+0x5d/0x120 > [330584.723817] ? xas_load+0x29/0x40 > [330584.723818] ? default_do_nmi+0x69/0x170 > [330584.723821] ? exc_nmi+0x13c/0x170 > [330584.723823] ? end_repeat_nmi+0x16/0x67 > [330584.723828] ? xas_load+0x29/0x40 > [330584.723830] ? xas_load+0x29/0x40 > [330584.723832] ? xas_load+0x29/0x40 > [330584.723834] </NMI> > [330584.723834] <TASK> > [330584.723835] filemap_get_read_batch+0x16e/0x250 > [330584.723840] filemap_get_pages+0xa9/0x630 > [330584.723842] ? memcg_check_events+0xda/0x210 > [330584.723845] ? free_unref_page_commit+0x7c/0x170 > [330584.723849] ? _raw_spin_unlock_irqrestore+0x23/0x40 > [330584.723851] ? free_unref_page+0x1ac/0x220 > [330584.723852] filemap_read+0xd2/0x340 > [330584.723857] xfs_file_buffered_read+0x4f/0xd0 [xfs] > [330584.723935] xfs_file_read_iter+0x6a/0xd0 [xfs] > [330584.723990] vfs_read+0x23c/0x310 > [330584.723995] __x64_sys_pread64+0x94/0xc0 > [330584.723997] do_syscall_64+0x3a/0x90 > [330584.724000] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [330584.724003] RIP: 0033:0x7ff8de90e747 > [330584.724019] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 > f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 > f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 > [330584.724020] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX: > 0000000000000011 > [330584.724021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007ff8de90e747 > [330584.724022] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI: > 0000000000000230 > [330584.724023] RBP: 0000000000000001 R08: 000000000a00000d R09: > 0000000000000000 > [330584.724023] R10: 000000000f160000 R11: 0000000000000202 R12: > 00007ff8cdd92688 > [330584.724024] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: > 0000562bb5fe04d0 > [330584.724026] </TASK> > > > This keeps repeating until around 4:03 which ends in systemd-journal getting > coredumped: > > [331277.846966] rcu: INFO: rcu_preempt self-detected stall on CPU > [331277.847413] rcu: 1-....: (713858 ticks this GP) > idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=311996 > [331277.848088] (t=714253 jiffies g=142859597 q=1821602 ncpus=3) > [331277.848462] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31 > #1-NixOS > [331277.848941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 > [331277.849667] RIP: 0010:xas_descend+0x22/0x70 > [331277.849952] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 > e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83 f9 > 02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc > [331277.851120] RSP: 0018:ffffb427c3387bf0 EFLAGS: 00000202 > [331277.851468] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX: > 0000000000000002 > [331277.851928] RDX: 000000000000000f RSI: ffff98871f9d2920 RDI: > ffffb427c3387c00 > [331277.852386] RBP: 000000000000f161 R08: ffffb427c3387e68 R09: > ffffda1444482000 > [331277.852847] R10: 0000000000000001 R11: 0000000000000001 R12: > 000000000000f161 > [331277.853303] R13: ffff9887ad3c2700 R14: 000000000000f160 R15: > ffffb427c3387e90 > [331277.853766] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000) > knlGS:0000000000000000 > [331277.854278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [331277.854653] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4: > 00000000000006e0 > [331277.855111] Call Trace: > [331277.855284] <IRQ> > [331277.855433] ? rcu_dump_cpu_stacks+0xc8/0x100 > [331277.855730] ? rcu_sched_clock_irq.cold+0x15b/0x2fb > [331277.856049] ? sched_slice+0x87/0x140 > [331277.856294] ? timekeeping_update+0xdd/0x130 > [331277.856587] ? __cgroup_account_cputime_field+0x5b/0xa0 > [331277.856930] ? update_process_times+0x77/0xb0 > [331277.857222] ? update_wall_time+0xc/0x20 > [331277.857482] ? tick_sched_handle+0x34/0x50 > [331277.857758] ? tick_sched_timer+0x6f/0x80 > [331277.858024] ? tick_sched_do_timer+0xa0/0xa0 > [331277.858306] ? __hrtimer_run_queues+0x112/0x2b0 > [331277.858613] ? hrtimer_interrupt+0xfe/0x220 > [331277.858891] ? __sysvec_apic_timer_interrupt+0x7f/0x170 > [331277.859235] ? sysvec_apic_timer_interrupt+0x99/0xc0 > [331277.859558] </IRQ> > [331277.859710] <TASK> > [331277.859857] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [331277.860206] ? xas_descend+0x22/0x70 > [331277.860446] xas_load+0x30/0x40 > [331277.860665] filemap_get_read_batch+0x16e/0x250 > [331277.860967] filemap_get_pages+0xa9/0x630 > [331277.861233] ? memcg_check_events+0xda/0x210 > [331277.861517] ? free_unref_page_commit+0x7c/0x170 > [331277.861834] ? _raw_spin_unlock_irqrestore+0x23/0x40 > [331277.862158] ? free_unref_page+0x1ac/0x220 > [331277.862427] filemap_read+0xd2/0x340 > [331277.862677] xfs_file_buffered_read+0x4f/0xd0 [xfs] > [331277.863072] xfs_file_read_iter+0x6a/0xd0 [xfs] > [331277.863424] vfs_read+0x23c/0x310 > [331277.863657] __x64_sys_pread64+0x94/0xc0 > [331277.863917] do_syscall_64+0x3a/0x90 > [331277.864159] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [331277.864488] RIP: 0033:0x7ff8de90e747 > [331277.864752] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 > f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 > f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 > [331277.865917] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX: > 0000000000000011 > [331277.866395] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007ff8de90e747 > [331277.866848] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI: > 0000000000000230 > [331277.867298] RBP: 0000000000000001 R08: 000000000a00000d R09: > 0000000000000000 > [331277.867750] R10: 000000000f160000 R11: 0000000000000202 R12: > 00007ff8cdd92688 > [331277.868199] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: > 0000562bb5fe04d0 > [331277.868658] </TASK> > Fri Jun 16 04:03:23 AM CEST 2023 -- SERIAL CONSOLE IS LIVE -- > [331289.987247] systemd[1]: Starting Serial console liveness marker... > [331289.992645] systemd[1]: nscd.service: Deactivated successfully. > [331290.011838] systemd[1]: nscd.service: Consumed 3min 16.251s CPU time, > received 12.0M IP traffic, sent 8.2M IP traffic. > [331290.020289] systemd[1]: serial-console-liveness.service: Deactivated > successfully. > [331290.035818] systemd[1]: Finished Serial console liveness marker. > [331290.068776] systemd[1]: Started Logrotate Service. > [331290.069700] systemd[1]: systemd-journald.service: Watchdog timeout (limit > 3min)! > [331290.070475] systemd[1]: systemd-journald.service: Killing process 528 > (systemd-journal) with signal SIGABRT. > [331290.096246] systemd[1]: logrotate.service: Deactivated successfully. > [331290.285285] systemd-coredump[1267441]: Process 528 (systemd-journal) of > user 0 dumped core. > [331290.286002] systemd-coredump[1267441]: Coredump diverted to > /var/lib/systemd/coredump/core.systemd-journal.0.39df468c96764e8c8fffc53a0d0a47d1.528.1686881003000000.zst > [331290.287045] systemd-coredump[1267441]: Module libaudit.so.1 without > build-id. > [331290.287574] systemd-coredump[1267441]: Module libnftnl.so.11 without > build-id. > [331290.288111] systemd-coredump[1267441]: Module libmnl.so.0 without build-id. > [331290.288587] systemd-coredump[1267441]: Module libgpg-error.so.0 without > build-id. > [331290.289076] systemd-coredump[1267441]: Module libattr.so.1 without > build-id. > [331290.289541] systemd-coredump[1267441]: Module libzstd.so.1 without > build-id. > [331290.289991] systemd-coredump[1267441]: Module liblzma.so.5 without > build-id. > [331290.290446] systemd-coredump[1267441]: Module libseccomp.so.2 without > build-id. > [331290.290940] systemd-coredump[1267441]: Module libpam.so.0 without build-id. > [331290.291419] systemd-coredump[1267441]: Module liblz4.so.1 without build-id. > [331290.291893] systemd-coredump[1267441]: Module libkmod.so.2 without > build-id. > [331290.292370] systemd-coredump[1267441]: Module libip4tc.so.2 without > build-id. > [331290.295165] systemd-coredump[1267441]: Module libgcrypt.so.20 without > build-id. > [331290.295678] systemd-coredump[1267441]: Module libcrypt.so.2 without > build-id. > [331290.296160] systemd-coredump[1267441]: Module libcap.so.2 without build-id. > [331290.296708] systemd-coredump[1267441]: Module libacl.so.1 without build-id. > [331290.297206] systemd-coredump[1267441]: Module libsystemd-shared-253.so > without build-id. > [331290.297768] systemd-coredump[1267441]: Module systemd-journald without > build-id. > [331290.298273] systemd-coredump[1267441]: Stack trace of thread 528: > [331290.298714] systemd-coredump[1267441]: #0 0x00007f3e96431de8 > check_object_header (libsystemd-shared-253.so + 0x231de8) > [331290.299435] systemd-coredump[1267441]: #1 0x00007f3e964330d2 > journal_file_move_to_object (libsystemd-shared-253.so + 0x2330d2) > [331290.300199] systemd-coredump[1267441]: #2 0x00007f3e96434a71 > journal_file_find_data_object_with_hash (libsystemd-shared-253.so + 0x234a71) > [331290.302207] systemd-coredump[1267441]: #3 0x00007f3e96434d3b > journal_file_append_data (libsystemd-shared-253.so + 0x234d3b) > [331290.302983] systemd-coredump[1267441]: #4 0x00007f3e96437243 > journal_file_append_entry (libsystemd-shared-253.so + 0x237243) > [331290.303767] systemd-coredump[1267441]: #5 0x000056536a4a7cd5 > server_dispatch_message_real (systemd-journald + 0x10cd5) > [331290.304504] systemd-coredump[1267441]: #6 0x000056536a4a134b > dev_kmsg_record (systemd-journald + 0xa34b) > [331290.305195] systemd-coredump[1267441]: #7 0x000056536a4a182b > server_read_dev_kmsg (systemd-journald + 0xa82b) > [331290.305922] systemd-coredump[1267441]: #8 0x00007f3e9645c140 > source_dispatch (libsystemd-shared-253.so + 0x25c140) > [331290.306640] systemd-coredump[1267441]: #9 0x00007f3e9645c42d > sd_event_dispatch (libsystemd-shared-253.so + 0x25c42d) > [331290.307350] systemd-coredump[1267441]: #10 0x00007f3e9645cb48 sd_event_run > (libsystemd-shared-253.so + 0x25cb48) > [331290.308043] systemd-coredump[1267441]: #11 0x000056536a4a0568 main > (systemd-journald + 0x9568) > [331290.308652] systemd-coredump[1267441]: #12 0x00007f3e9603dace > __libc_start_call_main (libc.so.6 + 0x23ace) > [331290.309292] systemd-coredump[1267441]: #13 0x00007f3e9603db89 > __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89) > [331290.310048] systemd-coredump[1267441]: #14 0x000056536a4a0835 _start > (systemd-journald + 0x9835) > [331290.310692] systemd-coredump[1267441]: ELF object binary architecture: AMD > x86-64 > > > This go back to normal until around 20:03 (around 14 hours later): > > [389370.873026] rcu: INFO: rcu_preempt self-detected stall on CPU > [389370.873481] rcu: 0-....: (21000 ticks this GP) > idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704 > [389370.874126] (t=21002 jiffies g=167843445 q=13889 ncpus=3) > [389370.874500] CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31 > #1-NixOS > [389370.874979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 > [389370.875710] RIP: 0010:xas_descend+0x26/0x70 > [389370.875991] Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 > 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08 48 > 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 > [389370.877164] RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246 > [389370.877512] RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX: > 0000000000000002 > [389370.877967] RDX: 0000000000000005 RSI: ffff988876d3c000 RDI: > ffffb427c4917c00 > [389370.878426] RBP: 000000000000f177 R08: ffffb427c4917e68 R09: > ffff988846485d38 > [389370.878880] R10: ffffb427c4917e60 R11: ffff988846485d38 R12: > 000000000000f177 > [389370.879341] R13: ffff988827b4ae00 R14: 000000000000f176 R15: > ffffb427c4917e90 > [389370.879798] FS: 00007ff8de817800(0000) GS:ffff98887ac00000(0000) > knlGS:0000000000000000 > [389370.880308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [389370.880683] CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4: > 00000000000006f0 > [389370.881139] Call Trace: > [389370.881309] <IRQ> > [389370.881463] ? rcu_dump_cpu_stacks+0xc8/0x100 > [389370.881755] ? rcu_sched_clock_irq.cold+0x15b/0x2fb > [389370.882077] ? sched_slice+0x87/0x140 > [389370.882325] ? perf_event_task_tick+0x64/0x370 > [389370.882629] ? __cgroup_account_cputime_field+0x5b/0xa0 > [389370.882968] ? update_process_times+0x77/0xb0 > [389370.883258] ? tick_sched_handle+0x34/0x50 > [389370.883538] ? tick_sched_timer+0x6f/0x80 > [389370.883802] ? tick_sched_do_timer+0xa0/0xa0 > [389370.884084] ? __hrtimer_run_queues+0x112/0x2b0 > [389370.884388] ? hrtimer_interrupt+0xfe/0x220 > [389370.884666] ? __sysvec_apic_timer_interrupt+0x7f/0x170 > [389370.885006] ? sysvec_apic_timer_interrupt+0x99/0xc0 > [389370.885338] </IRQ> > [389370.885488] <TASK> > [389370.885636] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [389370.885983] ? xas_descend+0x26/0x70 > [389370.886224] xas_load+0x30/0x40 > [389370.886456] filemap_get_read_batch+0x16e/0x250 > [389370.886758] filemap_get_pages+0xa9/0x630 > [389370.887026] ? atime_needs_update+0x104/0x180 > [389370.887317] ? touch_atime+0x46/0x1f0 > [389370.887573] filemap_read+0xd2/0x340 > [389370.887817] xfs_file_buffered_read+0x4f/0xd0 [xfs] > [389370.888210] xfs_file_read_iter+0x6a/0xd0 [xfs] > [389370.888571] vfs_read+0x23c/0x310 > [389370.888797] __x64_sys_pread64+0x94/0xc0 > [389370.889058] do_syscall_64+0x3a/0x90 > [389370.889299] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [389370.889634] RIP: 0033:0x7ff8de90e747 > [389370.889900] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 > f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 > f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 > [389370.891063] RSP: 002b:00007fff5206a848 EFLAGS: 00000202 ORIG_RAX: > 0000000000000011 > [389370.891549] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007ff8de90e747 > [389370.892003] RDX: 0000000000002000 RSI: 00007ff881c8ce80 RDI: > 00000000000001c7 > [389370.892466] RBP: 0000000000000001 R08: 000000000a00000d R09: > 0000000000000000 > [389370.892923] R10: 000000000f176000 R11: 0000000000000202 R12: > 00007ff8cdd91278 > [389370.893386] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: > 0000562bb5e55828 > [389370.893843] </TASK> > > > At this point we have to restart the VM externally because it didn't respond > properly any longer. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are watching the assignee of the bug. Liebe Grüße, Christian Theune -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick