https://bugzilla.kernel.org/show_bug.cgi?id=217572 Bug ID: 217572 Summary: Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13 Product: File System Version: 2.5 Hardware: All OS: Linux Status: NEW Severity: high Priority: P3 Component: XFS Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx Reporter: ct@xxxxxxxxxxxxxxx Regression: No Last Friday we experienced the following hung task messages with PostgreSQL while performing our nightly backup using pg_dump. Normally this takes at most a few minutes with IO being stressed. This time it caused high SYS CPU time, went on for almost 20 minutes and caused the PostgreSQL dump to fail with inconsistent data. Around 3:50 AM we got this: [330289.821046] INFO: task .postgres-wrapp:11884 blocked for more than 122 seconds. [330289.821830] Not tainted 6.1.31 #1-NixOS [330289.822285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [330289.823098] task:.postgres-wrapp state:D stack:0 pid:11884 ppid:11881 flags:0x00000002 [330289.823911] Call Trace: [330289.824221] <TASK> [330289.824451] __schedule+0x35d/0x1370 [330289.824858] ? mntput_no_expire+0x4a/0x250 [330289.825307] schedule+0x5d/0xe0 [330289.825630] rwsem_down_write_slowpath+0x34e/0x730 [330289.826128] xfs_ilock+0xeb/0xf0 [xfs] [330289.826599] xfs_file_buffered_write+0x119/0x300 [xfs] [330289.827212] ? selinux_file_permission+0x10b/0x150 [330289.827683] vfs_write+0x244/0x400 [330289.828049] __x64_sys_pwrite64+0x94/0xc0 [330289.828459] do_syscall_64+0x3a/0x90 [330289.828801] entry_SYSCALL_64_after_hwframe+0x63/0xcd [330289.829302] RIP: 0033:0x7ff8de90e7f7 [330289.829681] RSP: 002b:00007fff52069b08 EFLAGS: 00000202 ORIG_RAX: 0000000000000012 [330289.830408] RAX: ffffffffffffffda RBX: 0000562bb434f510 RCX: 00007ff8de90e7f7 [330289.831073] RDX: 0000000000002000 RSI: 00007ff888b52e80 RDI: 000000000000003b [330289.831762] RBP: 00000000000021b0 R08: 000000000a000010 R09: 0000000000000040 [330289.832440] R10: 0000000004ed8000 R11: 0000000000000202 R12: 0000000000002000 [330289.833130] R13: 0000000004ed8000 R14: 00007ff8de8176c8 R15: 0000562bb434af75 [330289.833803] </TASK> [330289.834064] INFO: task .postgres-wrapp:1245532 blocked for more than 122 seconds. [330289.834771] Not tainted 6.1.31 #1-NixOS [330289.835209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [330289.835926] task:.postgres-wrapp state:D stack:0 pid:1245532 ppid:11881 flags:0x00000002 [330289.836752] Call Trace: [330289.837010] <TASK> [330289.837258] __schedule+0x35d/0x1370 [330289.837622] ? page_add_file_rmap+0xba/0x2f0 [330289.838047] ? do_set_pte+0x174/0x1c0 [330289.838420] ? unix_stream_read_generic+0x223/0xa60 [330289.838887] schedule+0x5d/0xe0 [330289.839265] schedule_preempt_disabled+0x14/0x30 [330289.839758] rwsem_down_read_slowpath+0x29e/0x4f0 [330289.840521] down_read+0x47/0xb0 [330289.840853] xfs_ilock+0x79/0xf0 [xfs] [330289.841346] xfs_file_buffered_read+0x44/0xd0 [xfs] [330289.841945] xfs_file_read_iter+0x6a/0xd0 [xfs] [330289.842496] vfs_read+0x23c/0x310 [330289.842845] __x64_sys_pread64+0x94/0xc0 [330289.843303] do_syscall_64+0x3a/0x90 [330289.843857] entry_SYSCALL_64_after_hwframe+0x63/0xcd [330289.844420] RIP: 0033:0x7ff8de90e747 [330289.844898] RSP: 002b:00007fff5206aae8 EFLAGS: 00000202 ORIG_RAX: 0000000000000011 [330289.845705] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff8de90e747 [330289.846488] RDX: 0000000000002000 RSI: 00007ff8896e0e80 RDI: 0000000000000010 [330289.847133] RBP: 0000000000000001 R08: 000000000a00000d R09: 0000000000000000 [330289.847627] R10: 0000000018fb8000 R11: 0000000000000202 R12: 00007ff8cdd91278 [330289.848113] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: 0000562bb5e89e70 [330289.848631] </TASK> This keeps going on for a few minutes until: [330584.618978] rcu: INFO: rcu_preempt self-detected stall on CPU [330584.619413] rcu: 1-....: (20999 ticks this GP) idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=5231 [330584.620052] (t=21000 jiffies g=142859597 q=21168 ncpus=3) [330584.620409] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31 #1-NixOS [330584.620880] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [330584.621588] RIP: 0010:xas_load+0xb/0x40 [330584.621844] Code: 04 48 8b 44 c2 08 c3 cc cc cc cc 48 8b 07 48 8b 40 08 c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 e8 3b ff ff ff 48 89 c2 83 e2 03 <48> 83 fa 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f [330584.622996] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000202 [330584.623337] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX: 0000000000000002 [330584.623787] RDX: 0000000000000002 RSI: ffff98871f9d2920 RDI: ffffb427c3387c00 [330584.624239] RBP: 000000000000f161 R08: ffffb427c3387e68 R09: ffffda1444482000 [330584.624686] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000f161 [330584.625136] R13: ffff9887ad3c2700 R14: 000000000000f160 R15: ffffb427c3387e90 [330584.625591] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000) knlGS:0000000000000000 [330584.626097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [330584.626467] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4: 00000000000006e0 [330584.626954] Call Trace: [330584.627125] <IRQ> [330584.627281] ? rcu_dump_cpu_stacks+0xc8/0x100 [330584.627567] ? rcu_sched_clock_irq.cold+0x15b/0x2fb [330584.627882] ? sched_slice+0x87/0x140 [330584.628126] ? timekeeping_update+0xdd/0x130 [330584.628414] ? __cgroup_account_cputime_field+0x5b/0xa0 [330584.628751] ? update_process_times+0x77/0xb0 [330584.629036] ? update_wall_time+0xc/0x20 [330584.629300] ? tick_sched_handle+0x34/0x50 [330584.629564] ? tick_sched_timer+0x6f/0x80 [330584.629823] ? tick_sched_do_timer+0xa0/0xa0 [330584.630103] ? __hrtimer_run_queues+0x112/0x2b0 [330584.630404] ? hrtimer_interrupt+0xfe/0x220 [330584.630678] ? __sysvec_apic_timer_interrupt+0x7f/0x170 [330584.631016] ? sysvec_apic_timer_interrupt+0x99/0xc0 [330584.631339] </IRQ> [330584.631485] <TASK> [330584.631631] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [330584.631974] ? xas_load+0xb/0x40 [330584.632195] ? xas_load+0x30/0x40 [330584.632415] filemap_get_read_batch+0x16e/0x250 [330584.632710] filemap_get_pages+0xa9/0x630 [330584.632972] ? memcg_check_events+0xda/0x210 [330584.633259] ? free_unref_page_commit+0x7c/0x170 [330584.633560] ? _raw_spin_unlock_irqrestore+0x23/0x40 [330584.633880] ? free_unref_page+0x1ac/0x220 [330584.634146] filemap_read+0xd2/0x340 [330584.634389] xfs_file_buffered_read+0x4f/0xd0 [xfs] [330584.634778] xfs_file_read_iter+0x6a/0xd0 [xfs] [330584.635123] vfs_read+0x23c/0x310 [330584.635354] __x64_sys_pread64+0x94/0xc0 [330584.635609] do_syscall_64+0x3a/0x90 [330584.635846] entry_SYSCALL_64_after_hwframe+0x63/0xcd [330584.636174] RIP: 0033:0x7ff8de90e747 [330584.636437] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 [330584.637582] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX: 0000000000000011 [330584.638056] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff8de90e747 [330584.638506] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI: 0000000000000230 [330584.638954] RBP: 0000000000000001 R08: 000000000a00000d R09: 0000000000000000 [330584.639411] R10: 000000000f160000 R11: 0000000000000202 R12: 00007ff8cdd92688 [330584.639862] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: 0000562bb5fe04d0 [330584.640313] </TASK> [330584.721995] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-.... } 21015 jiffies s: 7297 root: 0x2/. [330584.722930] rcu: blocking rcu_node structures (internal RCU debug): [330584.723428] Sending NMI from CPU 0 to CPUs 1: [330584.723763] NMI backtrace for cpu 1 [330584.723769] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31 #1-NixOS [330584.723772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [330584.723775] RIP: 0010:xas_load+0x29/0x40 [330584.723783] Code: 00 e8 3b ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f 10 48 8d 70 fe 38 48 fe <72> ee e8 20 fe ff ff 80 3e 00 75 d0 c3 cc cc cc cc 66 0f 1f 44 00 [330584.723785] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000246 [330584.723787] RAX: ffff98871f8dbdaa RBX: ffffb427c3387d70 RCX: 0000000000000000 [330584.723788] RDX: 0000000000000002 RSI: ffff98871f8dbda8 RDI: ffffb427c3387c00 [330584.723789] RBP: 000000000000f161 R08: ffffb427c3387e68 R09: ffffda1444482000 [330584.723790] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000f161 [330584.723791] R13: ffff9887ad3c2700 R14: 000000000000f160 R15: ffffb427c3387e90 [330584.723793] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000) knlGS:0000000000000000 [330584.723794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [330584.723795] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4: 00000000000006e0 [330584.723798] Call Trace: [330584.723801] <NMI> [330584.723805] ? nmi_cpu_backtrace.cold+0x1b/0x76 [330584.723810] ? nmi_cpu_backtrace_handler+0xd/0x20 [330584.723814] ? nmi_handle+0x5d/0x120 [330584.723817] ? xas_load+0x29/0x40 [330584.723818] ? default_do_nmi+0x69/0x170 [330584.723821] ? exc_nmi+0x13c/0x170 [330584.723823] ? end_repeat_nmi+0x16/0x67 [330584.723828] ? xas_load+0x29/0x40 [330584.723830] ? xas_load+0x29/0x40 [330584.723832] ? xas_load+0x29/0x40 [330584.723834] </NMI> [330584.723834] <TASK> [330584.723835] filemap_get_read_batch+0x16e/0x250 [330584.723840] filemap_get_pages+0xa9/0x630 [330584.723842] ? memcg_check_events+0xda/0x210 [330584.723845] ? free_unref_page_commit+0x7c/0x170 [330584.723849] ? _raw_spin_unlock_irqrestore+0x23/0x40 [330584.723851] ? free_unref_page+0x1ac/0x220 [330584.723852] filemap_read+0xd2/0x340 [330584.723857] xfs_file_buffered_read+0x4f/0xd0 [xfs] [330584.723935] xfs_file_read_iter+0x6a/0xd0 [xfs] [330584.723990] vfs_read+0x23c/0x310 [330584.723995] __x64_sys_pread64+0x94/0xc0 [330584.723997] do_syscall_64+0x3a/0x90 [330584.724000] entry_SYSCALL_64_after_hwframe+0x63/0xcd [330584.724003] RIP: 0033:0x7ff8de90e747 [330584.724019] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 [330584.724020] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX: 0000000000000011 [330584.724021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff8de90e747 [330584.724022] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI: 0000000000000230 [330584.724023] RBP: 0000000000000001 R08: 000000000a00000d R09: 0000000000000000 [330584.724023] R10: 000000000f160000 R11: 0000000000000202 R12: 00007ff8cdd92688 [330584.724024] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: 0000562bb5fe04d0 [330584.724026] </TASK> This keeps repeating until around 4:03 which ends in systemd-journal getting coredumped: [331277.846966] rcu: INFO: rcu_preempt self-detected stall on CPU [331277.847413] rcu: 1-....: (713858 ticks this GP) idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=311996 [331277.848088] (t=714253 jiffies g=142859597 q=1821602 ncpus=3) [331277.848462] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31 #1-NixOS [331277.848941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [331277.849667] RIP: 0010:xas_descend+0x22/0x70 [331277.849952] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83 f9 02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc [331277.851120] RSP: 0018:ffffb427c3387bf0 EFLAGS: 00000202 [331277.851468] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX: 0000000000000002 [331277.851928] RDX: 000000000000000f RSI: ffff98871f9d2920 RDI: ffffb427c3387c00 [331277.852386] RBP: 000000000000f161 R08: ffffb427c3387e68 R09: ffffda1444482000 [331277.852847] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000f161 [331277.853303] R13: ffff9887ad3c2700 R14: 000000000000f160 R15: ffffb427c3387e90 [331277.853766] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000) knlGS:0000000000000000 [331277.854278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [331277.854653] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4: 00000000000006e0 [331277.855111] Call Trace: [331277.855284] <IRQ> [331277.855433] ? rcu_dump_cpu_stacks+0xc8/0x100 [331277.855730] ? rcu_sched_clock_irq.cold+0x15b/0x2fb [331277.856049] ? sched_slice+0x87/0x140 [331277.856294] ? timekeeping_update+0xdd/0x130 [331277.856587] ? __cgroup_account_cputime_field+0x5b/0xa0 [331277.856930] ? update_process_times+0x77/0xb0 [331277.857222] ? update_wall_time+0xc/0x20 [331277.857482] ? tick_sched_handle+0x34/0x50 [331277.857758] ? tick_sched_timer+0x6f/0x80 [331277.858024] ? tick_sched_do_timer+0xa0/0xa0 [331277.858306] ? __hrtimer_run_queues+0x112/0x2b0 [331277.858613] ? hrtimer_interrupt+0xfe/0x220 [331277.858891] ? __sysvec_apic_timer_interrupt+0x7f/0x170 [331277.859235] ? sysvec_apic_timer_interrupt+0x99/0xc0 [331277.859558] </IRQ> [331277.859710] <TASK> [331277.859857] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [331277.860206] ? xas_descend+0x22/0x70 [331277.860446] xas_load+0x30/0x40 [331277.860665] filemap_get_read_batch+0x16e/0x250 [331277.860967] filemap_get_pages+0xa9/0x630 [331277.861233] ? memcg_check_events+0xda/0x210 [331277.861517] ? free_unref_page_commit+0x7c/0x170 [331277.861834] ? _raw_spin_unlock_irqrestore+0x23/0x40 [331277.862158] ? free_unref_page+0x1ac/0x220 [331277.862427] filemap_read+0xd2/0x340 [331277.862677] xfs_file_buffered_read+0x4f/0xd0 [xfs] [331277.863072] xfs_file_read_iter+0x6a/0xd0 [xfs] [331277.863424] vfs_read+0x23c/0x310 [331277.863657] __x64_sys_pread64+0x94/0xc0 [331277.863917] do_syscall_64+0x3a/0x90 [331277.864159] entry_SYSCALL_64_after_hwframe+0x63/0xcd [331277.864488] RIP: 0033:0x7ff8de90e747 [331277.864752] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 [331277.865917] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX: 0000000000000011 [331277.866395] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff8de90e747 [331277.866848] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI: 0000000000000230 [331277.867298] RBP: 0000000000000001 R08: 000000000a00000d R09: 0000000000000000 [331277.867750] R10: 000000000f160000 R11: 0000000000000202 R12: 00007ff8cdd92688 [331277.868199] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: 0000562bb5fe04d0 [331277.868658] </TASK> Fri Jun 16 04:03:23 AM CEST 2023 -- SERIAL CONSOLE IS LIVE -- [331289.987247] systemd[1]: Starting Serial console liveness marker... [331289.992645] systemd[1]: nscd.service: Deactivated successfully. [331290.011838] systemd[1]: nscd.service: Consumed 3min 16.251s CPU time, received 12.0M IP traffic, sent 8.2M IP traffic. [331290.020289] systemd[1]: serial-console-liveness.service: Deactivated successfully. [331290.035818] systemd[1]: Finished Serial console liveness marker. [331290.068776] systemd[1]: Started Logrotate Service. [331290.069700] systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)! [331290.070475] systemd[1]: systemd-journald.service: Killing process 528 (systemd-journal) with signal SIGABRT. [331290.096246] systemd[1]: logrotate.service: Deactivated successfully. [331290.285285] systemd-coredump[1267441]: Process 528 (systemd-journal) of user 0 dumped core. [331290.286002] systemd-coredump[1267441]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.39df468c96764e8c8fffc53a0d0a47d1.528.1686881003000000.zst [331290.287045] systemd-coredump[1267441]: Module libaudit.so.1 without build-id. [331290.287574] systemd-coredump[1267441]: Module libnftnl.so.11 without build-id. [331290.288111] systemd-coredump[1267441]: Module libmnl.so.0 without build-id. [331290.288587] systemd-coredump[1267441]: Module libgpg-error.so.0 without build-id. [331290.289076] systemd-coredump[1267441]: Module libattr.so.1 without build-id. [331290.289541] systemd-coredump[1267441]: Module libzstd.so.1 without build-id. [331290.289991] systemd-coredump[1267441]: Module liblzma.so.5 without build-id. [331290.290446] systemd-coredump[1267441]: Module libseccomp.so.2 without build-id. [331290.290940] systemd-coredump[1267441]: Module libpam.so.0 without build-id. [331290.291419] systemd-coredump[1267441]: Module liblz4.so.1 without build-id. [331290.291893] systemd-coredump[1267441]: Module libkmod.so.2 without build-id. [331290.292370] systemd-coredump[1267441]: Module libip4tc.so.2 without build-id. [331290.295165] systemd-coredump[1267441]: Module libgcrypt.so.20 without build-id. [331290.295678] systemd-coredump[1267441]: Module libcrypt.so.2 without build-id. [331290.296160] systemd-coredump[1267441]: Module libcap.so.2 without build-id. [331290.296708] systemd-coredump[1267441]: Module libacl.so.1 without build-id. [331290.297206] systemd-coredump[1267441]: Module libsystemd-shared-253.so without build-id. [331290.297768] systemd-coredump[1267441]: Module systemd-journald without build-id. [331290.298273] systemd-coredump[1267441]: Stack trace of thread 528: [331290.298714] systemd-coredump[1267441]: #0 0x00007f3e96431de8 check_object_header (libsystemd-shared-253.so + 0x231de8) [331290.299435] systemd-coredump[1267441]: #1 0x00007f3e964330d2 journal_file_move_to_object (libsystemd-shared-253.so + 0x2330d2) [331290.300199] systemd-coredump[1267441]: #2 0x00007f3e96434a71 journal_file_find_data_object_with_hash (libsystemd-shared-253.so + 0x234a71) [331290.302207] systemd-coredump[1267441]: #3 0x00007f3e96434d3b journal_file_append_data (libsystemd-shared-253.so + 0x234d3b) [331290.302983] systemd-coredump[1267441]: #4 0x00007f3e96437243 journal_file_append_entry (libsystemd-shared-253.so + 0x237243) [331290.303767] systemd-coredump[1267441]: #5 0x000056536a4a7cd5 server_dispatch_message_real (systemd-journald + 0x10cd5) [331290.304504] systemd-coredump[1267441]: #6 0x000056536a4a134b dev_kmsg_record (systemd-journald + 0xa34b) [331290.305195] systemd-coredump[1267441]: #7 0x000056536a4a182b server_read_dev_kmsg (systemd-journald + 0xa82b) [331290.305922] systemd-coredump[1267441]: #8 0x00007f3e9645c140 source_dispatch (libsystemd-shared-253.so + 0x25c140) [331290.306640] systemd-coredump[1267441]: #9 0x00007f3e9645c42d sd_event_dispatch (libsystemd-shared-253.so + 0x25c42d) [331290.307350] systemd-coredump[1267441]: #10 0x00007f3e9645cb48 sd_event_run (libsystemd-shared-253.so + 0x25cb48) [331290.308043] systemd-coredump[1267441]: #11 0x000056536a4a0568 main (systemd-journald + 0x9568) [331290.308652] systemd-coredump[1267441]: #12 0x00007f3e9603dace __libc_start_call_main (libc.so.6 + 0x23ace) [331290.309292] systemd-coredump[1267441]: #13 0x00007f3e9603db89 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89) [331290.310048] systemd-coredump[1267441]: #14 0x000056536a4a0835 _start (systemd-journald + 0x9835) [331290.310692] systemd-coredump[1267441]: ELF object binary architecture: AMD x86-64 This go back to normal until around 20:03 (around 14 hours later): [389370.873026] rcu: INFO: rcu_preempt self-detected stall on CPU [389370.873481] rcu: 0-....: (21000 ticks this GP) idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704 [389370.874126] (t=21002 jiffies g=167843445 q=13889 ncpus=3) [389370.874500] CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31 #1-NixOS [389370.874979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [389370.875710] RIP: 0010:xas_descend+0x26/0x70 [389370.875991] Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 [389370.877164] RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246 [389370.877512] RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX: 0000000000000002 [389370.877967] RDX: 0000000000000005 RSI: ffff988876d3c000 RDI: ffffb427c4917c00 [389370.878426] RBP: 000000000000f177 R08: ffffb427c4917e68 R09: ffff988846485d38 [389370.878880] R10: ffffb427c4917e60 R11: ffff988846485d38 R12: 000000000000f177 [389370.879341] R13: ffff988827b4ae00 R14: 000000000000f176 R15: ffffb427c4917e90 [389370.879798] FS: 00007ff8de817800(0000) GS:ffff98887ac00000(0000) knlGS:0000000000000000 [389370.880308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [389370.880683] CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4: 00000000000006f0 [389370.881139] Call Trace: [389370.881309] <IRQ> [389370.881463] ? rcu_dump_cpu_stacks+0xc8/0x100 [389370.881755] ? rcu_sched_clock_irq.cold+0x15b/0x2fb [389370.882077] ? sched_slice+0x87/0x140 [389370.882325] ? perf_event_task_tick+0x64/0x370 [389370.882629] ? __cgroup_account_cputime_field+0x5b/0xa0 [389370.882968] ? update_process_times+0x77/0xb0 [389370.883258] ? tick_sched_handle+0x34/0x50 [389370.883538] ? tick_sched_timer+0x6f/0x80 [389370.883802] ? tick_sched_do_timer+0xa0/0xa0 [389370.884084] ? __hrtimer_run_queues+0x112/0x2b0 [389370.884388] ? hrtimer_interrupt+0xfe/0x220 [389370.884666] ? __sysvec_apic_timer_interrupt+0x7f/0x170 [389370.885006] ? sysvec_apic_timer_interrupt+0x99/0xc0 [389370.885338] </IRQ> [389370.885488] <TASK> [389370.885636] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [389370.885983] ? xas_descend+0x26/0x70 [389370.886224] xas_load+0x30/0x40 [389370.886456] filemap_get_read_batch+0x16e/0x250 [389370.886758] filemap_get_pages+0xa9/0x630 [389370.887026] ? atime_needs_update+0x104/0x180 [389370.887317] ? touch_atime+0x46/0x1f0 [389370.887573] filemap_read+0xd2/0x340 [389370.887817] xfs_file_buffered_read+0x4f/0xd0 [xfs] [389370.888210] xfs_file_read_iter+0x6a/0xd0 [xfs] [389370.888571] vfs_read+0x23c/0x310 [389370.888797] __x64_sys_pread64+0x94/0xc0 [389370.889058] do_syscall_64+0x3a/0x90 [389370.889299] entry_SYSCALL_64_after_hwframe+0x63/0xcd [389370.889634] RIP: 0033:0x7ff8de90e747 [389370.889900] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24 [389370.891063] RSP: 002b:00007fff5206a848 EFLAGS: 00000202 ORIG_RAX: 0000000000000011 [389370.891549] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff8de90e747 [389370.892003] RDX: 0000000000002000 RSI: 00007ff881c8ce80 RDI: 00000000000001c7 [389370.892466] RBP: 0000000000000001 R08: 000000000a00000d R09: 0000000000000000 [389370.892923] R10: 000000000f176000 R11: 0000000000000202 R12: 00007ff8cdd91278 [389370.893386] R13: 0000562bb434af75 R14: 0000562bb434f510 R15: 0000562bb5e55828 [389370.893843] </TASK> At this point we have to restart the VM externally because it didn't respond properly any longer. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.