[Bug 217572] Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=217572

--- Comment #1 from Christian Theune (ct@xxxxxxxxxxxxxxx) ---
Hi,

is there anything I can do to raise interest on this one? :)

> On 19. Jun 2023, at 10:29, bugzilla-daemon@xxxxxxxxxx wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=217572
> 
>            Bug ID: 217572
>           Summary: Initial blocked tasks causing deterioration over hours
>                    until (nearly) complete system lockup and data loss
>                    with PostgreSQL 13
>           Product: File System
>           Version: 2.5
>          Hardware: All
>                OS: Linux
>            Status: NEW
>          Severity: high
>          Priority: P3
>         Component: XFS
>          Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx
>          Reporter: ct@xxxxxxxxxxxxxxx
>        Regression: No
> 
> Last Friday we experienced the following hung task messages with PostgreSQL
> while performing our nightly backup using pg_dump. Normally this takes at
> most
> a few minutes with IO being stressed. This time it caused high SYS CPU time,
> went on for almost 20 minutes and caused the PostgreSQL dump to fail with
> inconsistent data.
> 
> Around 3:50 AM we got this:
> 
> [330289.821046] INFO: task .postgres-wrapp:11884 blocked for more than 122
> seconds.
> [330289.821830]       Not tainted 6.1.31 #1-NixOS
> [330289.822285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [330289.823098] task:.postgres-wrapp state:D stack:0     pid:11884 ppid:11881 
> flags:0x00000002
> [330289.823911] Call Trace:
> [330289.824221]  <TASK>
> [330289.824451]  __schedule+0x35d/0x1370
> [330289.824858]  ? mntput_no_expire+0x4a/0x250
> [330289.825307]  schedule+0x5d/0xe0
> [330289.825630]  rwsem_down_write_slowpath+0x34e/0x730
> [330289.826128]  xfs_ilock+0xeb/0xf0 [xfs]
> [330289.826599]  xfs_file_buffered_write+0x119/0x300 [xfs]
> [330289.827212]  ? selinux_file_permission+0x10b/0x150
> [330289.827683]  vfs_write+0x244/0x400
> [330289.828049]  __x64_sys_pwrite64+0x94/0xc0
> [330289.828459]  do_syscall_64+0x3a/0x90
> [330289.828801]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330289.829302] RIP: 0033:0x7ff8de90e7f7
> [330289.829681] RSP: 002b:00007fff52069b08 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000012
> [330289.830408] RAX: ffffffffffffffda RBX: 0000562bb434f510 RCX:
> 00007ff8de90e7f7
> [330289.831073] RDX: 0000000000002000 RSI: 00007ff888b52e80 RDI:
> 000000000000003b
> [330289.831762] RBP: 00000000000021b0 R08: 000000000a000010 R09:
> 0000000000000040
> [330289.832440] R10: 0000000004ed8000 R11: 0000000000000202 R12:
> 0000000000002000
> [330289.833130] R13: 0000000004ed8000 R14: 00007ff8de8176c8 R15:
> 0000562bb434af75
> [330289.833803]  </TASK>
> [330289.834064] INFO: task .postgres-wrapp:1245532 blocked for more than 122
> seconds.
> [330289.834771]       Not tainted 6.1.31 #1-NixOS
> [330289.835209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [330289.835926] task:.postgres-wrapp state:D stack:0     pid:1245532
> ppid:11881
> flags:0x00000002
> [330289.836752] Call Trace:
> [330289.837010]  <TASK>
> [330289.837258]  __schedule+0x35d/0x1370
> [330289.837622]  ? page_add_file_rmap+0xba/0x2f0
> [330289.838047]  ? do_set_pte+0x174/0x1c0
> [330289.838420]  ? unix_stream_read_generic+0x223/0xa60
> [330289.838887]  schedule+0x5d/0xe0
> [330289.839265]  schedule_preempt_disabled+0x14/0x30
> [330289.839758]  rwsem_down_read_slowpath+0x29e/0x4f0
> [330289.840521]  down_read+0x47/0xb0
> [330289.840853]  xfs_ilock+0x79/0xf0 [xfs]
> [330289.841346]  xfs_file_buffered_read+0x44/0xd0 [xfs]
> [330289.841945]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330289.842496]  vfs_read+0x23c/0x310
> [330289.842845]  __x64_sys_pread64+0x94/0xc0
> [330289.843303]  do_syscall_64+0x3a/0x90
> [330289.843857]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330289.844420] RIP: 0033:0x7ff8de90e747
> [330289.844898] RSP: 002b:00007fff5206aae8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330289.845705] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330289.846488] RDX: 0000000000002000 RSI: 00007ff8896e0e80 RDI:
> 0000000000000010
> [330289.847133] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330289.847627] R10: 0000000018fb8000 R11: 0000000000000202 R12:
> 00007ff8cdd91278
> [330289.848113] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5e89e70
> [330289.848631]  </TASK>
> 
> 
> This keeps going on for a few minutes until:
> 
> [330584.618978] rcu: INFO: rcu_preempt self-detected stall on CPU
> [330584.619413] rcu:    1-....: (20999 ticks this GP)
> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=5231
> [330584.620052]         (t=21000 jiffies g=142859597 q=21168 ncpus=3)
> [330584.620409] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [330584.620880] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [330584.621588] RIP: 0010:xas_load+0xb/0x40
> [330584.621844] Code: 04 48 8b 44 c2 08 c3 cc cc cc cc 48 8b 07 48 8b 40 08
> c3
> cc cc cc cc 0f 1f 84 00 00 00 00 00 e8 3b ff ff ff 48 89 c2 83 e2 03 <48> 83
> fa
> 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f
> [330584.622996] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000202
> [330584.623337] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
> 0000000000000002
> [330584.623787] RDX: 0000000000000002 RSI: ffff98871f9d2920 RDI:
> ffffb427c3387c00
> [330584.624239] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [330584.624686] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [330584.625136] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [330584.625591] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [330584.626097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [330584.626467] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [330584.626954] Call Trace:
> [330584.627125]  <IRQ>
> [330584.627281]  ? rcu_dump_cpu_stacks+0xc8/0x100
> [330584.627567]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [330584.627882]  ? sched_slice+0x87/0x140
> [330584.628126]  ? timekeeping_update+0xdd/0x130
> [330584.628414]  ? __cgroup_account_cputime_field+0x5b/0xa0
> [330584.628751]  ? update_process_times+0x77/0xb0
> [330584.629036]  ? update_wall_time+0xc/0x20
> [330584.629300]  ? tick_sched_handle+0x34/0x50
> [330584.629564]  ? tick_sched_timer+0x6f/0x80
> [330584.629823]  ? tick_sched_do_timer+0xa0/0xa0
> [330584.630103]  ? __hrtimer_run_queues+0x112/0x2b0
> [330584.630404]  ? hrtimer_interrupt+0xfe/0x220
> [330584.630678]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [330584.631016]  ? sysvec_apic_timer_interrupt+0x99/0xc0
> [330584.631339]  </IRQ>
> [330584.631485]  <TASK>
> [330584.631631]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [330584.631974]  ? xas_load+0xb/0x40
> [330584.632195]  ? xas_load+0x30/0x40
> [330584.632415]  filemap_get_read_batch+0x16e/0x250
> [330584.632710]  filemap_get_pages+0xa9/0x630
> [330584.632972]  ? memcg_check_events+0xda/0x210
> [330584.633259]  ? free_unref_page_commit+0x7c/0x170
> [330584.633560]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [330584.633880]  ? free_unref_page+0x1ac/0x220
> [330584.634146]  filemap_read+0xd2/0x340
> [330584.634389]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [330584.634778]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330584.635123]  vfs_read+0x23c/0x310
> [330584.635354]  __x64_sys_pread64+0x94/0xc0
> [330584.635609]  do_syscall_64+0x3a/0x90
> [330584.635846]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330584.636174] RIP: 0033:0x7ff8de90e747
> [330584.636437] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [330584.637582] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330584.638056] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330584.638506] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [330584.638954] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330584.639411] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [330584.639862] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [330584.640313]  </TASK>
> [330584.721995] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks:
> { 1-.... } 21015 jiffies s: 7297 root: 0x2/.
> [330584.722930] rcu: blocking rcu_node structures (internal RCU debug):
> [330584.723428] Sending NMI from CPU 0 to CPUs 1:
> [330584.723763] NMI backtrace for cpu 1
> [330584.723769] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [330584.723772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [330584.723775] RIP: 0010:xas_load+0x29/0x40
> [330584.723783] Code: 00 e8 3b ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08
> 48
> 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f 10 48 8d 70 fe 38 48 fe <72> ee
> e8
> 20 fe ff ff 80 3e 00 75 d0 c3 cc cc cc cc 66 0f 1f 44 00
> [330584.723785] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000246
> [330584.723787] RAX: ffff98871f8dbdaa RBX: ffffb427c3387d70 RCX:
> 0000000000000000
> [330584.723788] RDX: 0000000000000002 RSI: ffff98871f8dbda8 RDI:
> ffffb427c3387c00
> [330584.723789] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [330584.723790] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [330584.723791] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [330584.723793] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [330584.723794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [330584.723795] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [330584.723798] Call Trace:
> [330584.723801]  <NMI>
> [330584.723805]  ? nmi_cpu_backtrace.cold+0x1b/0x76
> [330584.723810]  ? nmi_cpu_backtrace_handler+0xd/0x20
> [330584.723814]  ? nmi_handle+0x5d/0x120
> [330584.723817]  ? xas_load+0x29/0x40
> [330584.723818]  ? default_do_nmi+0x69/0x170
> [330584.723821]  ? exc_nmi+0x13c/0x170
> [330584.723823]  ? end_repeat_nmi+0x16/0x67
> [330584.723828]  ? xas_load+0x29/0x40
> [330584.723830]  ? xas_load+0x29/0x40
> [330584.723832]  ? xas_load+0x29/0x40
> [330584.723834]  </NMI>
> [330584.723834]  <TASK>
> [330584.723835]  filemap_get_read_batch+0x16e/0x250
> [330584.723840]  filemap_get_pages+0xa9/0x630
> [330584.723842]  ? memcg_check_events+0xda/0x210
> [330584.723845]  ? free_unref_page_commit+0x7c/0x170
> [330584.723849]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [330584.723851]  ? free_unref_page+0x1ac/0x220
> [330584.723852]  filemap_read+0xd2/0x340
> [330584.723857]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [330584.723935]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330584.723990]  vfs_read+0x23c/0x310
> [330584.723995]  __x64_sys_pread64+0x94/0xc0
> [330584.723997]  do_syscall_64+0x3a/0x90
> [330584.724000]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330584.724003] RIP: 0033:0x7ff8de90e747
> [330584.724019] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [330584.724020] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330584.724021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330584.724022] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [330584.724023] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330584.724023] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [330584.724024] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [330584.724026]  </TASK>
> 
> 
> This keeps repeating until around 4:03 which ends in systemd-journal getting
> coredumped:
> 
> [331277.846966] rcu: INFO: rcu_preempt self-detected stall on CPU
> [331277.847413] rcu:    1-....: (713858 ticks this GP)
> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=311996
> [331277.848088]         (t=714253 jiffies g=142859597 q=1821602 ncpus=3)
> [331277.848462] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [331277.848941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [331277.849667] RIP: 0010:xas_descend+0x22/0x70
> [331277.849952] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea
> 83
> e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83
> f9
> 02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc
> [331277.851120] RSP: 0018:ffffb427c3387bf0 EFLAGS: 00000202
> [331277.851468] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
> 0000000000000002
> [331277.851928] RDX: 000000000000000f RSI: ffff98871f9d2920 RDI:
> ffffb427c3387c00
> [331277.852386] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [331277.852847] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [331277.853303] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [331277.853766] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [331277.854278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [331277.854653] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [331277.855111] Call Trace:
> [331277.855284]  <IRQ>
> [331277.855433]  ? rcu_dump_cpu_stacks+0xc8/0x100
> [331277.855730]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [331277.856049]  ? sched_slice+0x87/0x140
> [331277.856294]  ? timekeeping_update+0xdd/0x130
> [331277.856587]  ? __cgroup_account_cputime_field+0x5b/0xa0
> [331277.856930]  ? update_process_times+0x77/0xb0
> [331277.857222]  ? update_wall_time+0xc/0x20
> [331277.857482]  ? tick_sched_handle+0x34/0x50
> [331277.857758]  ? tick_sched_timer+0x6f/0x80
> [331277.858024]  ? tick_sched_do_timer+0xa0/0xa0
> [331277.858306]  ? __hrtimer_run_queues+0x112/0x2b0
> [331277.858613]  ? hrtimer_interrupt+0xfe/0x220
> [331277.858891]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [331277.859235]  ? sysvec_apic_timer_interrupt+0x99/0xc0
> [331277.859558]  </IRQ>
> [331277.859710]  <TASK>
> [331277.859857]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [331277.860206]  ? xas_descend+0x22/0x70
> [331277.860446]  xas_load+0x30/0x40
> [331277.860665]  filemap_get_read_batch+0x16e/0x250
> [331277.860967]  filemap_get_pages+0xa9/0x630
> [331277.861233]  ? memcg_check_events+0xda/0x210
> [331277.861517]  ? free_unref_page_commit+0x7c/0x170
> [331277.861834]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [331277.862158]  ? free_unref_page+0x1ac/0x220
> [331277.862427]  filemap_read+0xd2/0x340
> [331277.862677]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [331277.863072]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [331277.863424]  vfs_read+0x23c/0x310
> [331277.863657]  __x64_sys_pread64+0x94/0xc0
> [331277.863917]  do_syscall_64+0x3a/0x90
> [331277.864159]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [331277.864488] RIP: 0033:0x7ff8de90e747
> [331277.864752] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [331277.865917] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [331277.866395] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [331277.866848] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [331277.867298] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [331277.867750] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [331277.868199] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [331277.868658]  </TASK>
> Fri Jun 16 04:03:23 AM CEST 2023 -- SERIAL CONSOLE IS LIVE --
> [331289.987247] systemd[1]: Starting Serial console liveness marker...
> [331289.992645] systemd[1]: nscd.service: Deactivated successfully.
> [331290.011838] systemd[1]: nscd.service: Consumed 3min 16.251s CPU time,
> received 12.0M IP traffic, sent 8.2M IP traffic.
> [331290.020289] systemd[1]: serial-console-liveness.service: Deactivated
> successfully.
> [331290.035818] systemd[1]: Finished Serial console liveness marker.
> [331290.068776] systemd[1]: Started Logrotate Service.
> [331290.069700] systemd[1]: systemd-journald.service: Watchdog timeout (limit
> 3min)!
> [331290.070475] systemd[1]: systemd-journald.service: Killing process 528
> (systemd-journal) with signal SIGABRT.
> [331290.096246] systemd[1]: logrotate.service: Deactivated successfully.
> [331290.285285] systemd-coredump[1267441]: Process 528 (systemd-journal) of
> user 0 dumped core.
> [331290.286002] systemd-coredump[1267441]: Coredump diverted to
>
> /var/lib/systemd/coredump/core.systemd-journal.0.39df468c96764e8c8fffc53a0d0a47d1.528.1686881003000000.zst
> [331290.287045] systemd-coredump[1267441]: Module libaudit.so.1 without
> build-id.
> [331290.287574] systemd-coredump[1267441]: Module libnftnl.so.11 without
> build-id.
> [331290.288111] systemd-coredump[1267441]: Module libmnl.so.0 without
> build-id.
> [331290.288587] systemd-coredump[1267441]: Module libgpg-error.so.0 without
> build-id.
> [331290.289076] systemd-coredump[1267441]: Module libattr.so.1 without
> build-id.
> [331290.289541] systemd-coredump[1267441]: Module libzstd.so.1 without
> build-id.
> [331290.289991] systemd-coredump[1267441]: Module liblzma.so.5 without
> build-id.
> [331290.290446] systemd-coredump[1267441]: Module libseccomp.so.2 without
> build-id.
> [331290.290940] systemd-coredump[1267441]: Module libpam.so.0 without
> build-id.
> [331290.291419] systemd-coredump[1267441]: Module liblz4.so.1 without
> build-id.
> [331290.291893] systemd-coredump[1267441]: Module libkmod.so.2 without
> build-id.
> [331290.292370] systemd-coredump[1267441]: Module libip4tc.so.2 without
> build-id.
> [331290.295165] systemd-coredump[1267441]: Module libgcrypt.so.20 without
> build-id.
> [331290.295678] systemd-coredump[1267441]: Module libcrypt.so.2 without
> build-id.
> [331290.296160] systemd-coredump[1267441]: Module libcap.so.2 without
> build-id.
> [331290.296708] systemd-coredump[1267441]: Module libacl.so.1 without
> build-id.
> [331290.297206] systemd-coredump[1267441]: Module libsystemd-shared-253.so
> without build-id.
> [331290.297768] systemd-coredump[1267441]: Module systemd-journald without
> build-id.
> [331290.298273] systemd-coredump[1267441]: Stack trace of thread 528:
> [331290.298714] systemd-coredump[1267441]: #0  0x00007f3e96431de8
> check_object_header (libsystemd-shared-253.so + 0x231de8)
> [331290.299435] systemd-coredump[1267441]: #1  0x00007f3e964330d2
> journal_file_move_to_object (libsystemd-shared-253.so + 0x2330d2)
> [331290.300199] systemd-coredump[1267441]: #2  0x00007f3e96434a71
> journal_file_find_data_object_with_hash (libsystemd-shared-253.so + 0x234a71)
> [331290.302207] systemd-coredump[1267441]: #3  0x00007f3e96434d3b
> journal_file_append_data (libsystemd-shared-253.so + 0x234d3b)
> [331290.302983] systemd-coredump[1267441]: #4  0x00007f3e96437243
> journal_file_append_entry (libsystemd-shared-253.so + 0x237243)
> [331290.303767] systemd-coredump[1267441]: #5  0x000056536a4a7cd5
> server_dispatch_message_real (systemd-journald + 0x10cd5)
> [331290.304504] systemd-coredump[1267441]: #6  0x000056536a4a134b
> dev_kmsg_record (systemd-journald + 0xa34b)
> [331290.305195] systemd-coredump[1267441]: #7  0x000056536a4a182b
> server_read_dev_kmsg (systemd-journald + 0xa82b)
> [331290.305922] systemd-coredump[1267441]: #8  0x00007f3e9645c140
> source_dispatch (libsystemd-shared-253.so + 0x25c140)
> [331290.306640] systemd-coredump[1267441]: #9  0x00007f3e9645c42d
> sd_event_dispatch (libsystemd-shared-253.so + 0x25c42d)
> [331290.307350] systemd-coredump[1267441]: #10 0x00007f3e9645cb48
> sd_event_run
> (libsystemd-shared-253.so + 0x25cb48)
> [331290.308043] systemd-coredump[1267441]: #11 0x000056536a4a0568 main
> (systemd-journald + 0x9568)
> [331290.308652] systemd-coredump[1267441]: #12 0x00007f3e9603dace
> __libc_start_call_main (libc.so.6 + 0x23ace)
> [331290.309292] systemd-coredump[1267441]: #13 0x00007f3e9603db89
> __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
> [331290.310048] systemd-coredump[1267441]: #14 0x000056536a4a0835 _start
> (systemd-journald + 0x9835)
> [331290.310692] systemd-coredump[1267441]: ELF object binary architecture:
> AMD
> x86-64
> 
> 
> This go back to normal until around 20:03 (around 14 hours later):
> 
> [389370.873026] rcu: INFO: rcu_preempt self-detected stall on CPU
> [389370.873481] rcu:    0-....: (21000 ticks this GP)
> idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704
> [389370.874126]         (t=21002 jiffies g=167843445 q=13889 ncpus=3)
> [389370.874500] CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [389370.874979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [389370.875710] RIP: 0010:xas_descend+0x26/0x70
> [389370.875991] Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89
> d0
> 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08
> 48
> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02
> [389370.877164] RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246
> [389370.877512] RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX:
> 0000000000000002
> [389370.877967] RDX: 0000000000000005 RSI: ffff988876d3c000 RDI:
> ffffb427c4917c00
> [389370.878426] RBP: 000000000000f177 R08: ffffb427c4917e68 R09:
> ffff988846485d38
> [389370.878880] R10: ffffb427c4917e60 R11: ffff988846485d38 R12:
> 000000000000f177
> [389370.879341] R13: ffff988827b4ae00 R14: 000000000000f176 R15:
> ffffb427c4917e90
> [389370.879798] FS:  00007ff8de817800(0000) GS:ffff98887ac00000(0000)
> knlGS:0000000000000000
> [389370.880308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [389370.880683] CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4:
> 00000000000006f0
> [389370.881139] Call Trace:
> [389370.881309]  <IRQ>
> [389370.881463]  ? rcu_dump_cpu_stacks+0xc8/0x100
> [389370.881755]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [389370.882077]  ? sched_slice+0x87/0x140
> [389370.882325]  ? perf_event_task_tick+0x64/0x370
> [389370.882629]  ? __cgroup_account_cputime_field+0x5b/0xa0
> [389370.882968]  ? update_process_times+0x77/0xb0
> [389370.883258]  ? tick_sched_handle+0x34/0x50
> [389370.883538]  ? tick_sched_timer+0x6f/0x80
> [389370.883802]  ? tick_sched_do_timer+0xa0/0xa0
> [389370.884084]  ? __hrtimer_run_queues+0x112/0x2b0
> [389370.884388]  ? hrtimer_interrupt+0xfe/0x220
> [389370.884666]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [389370.885006]  ? sysvec_apic_timer_interrupt+0x99/0xc0
> [389370.885338]  </IRQ>
> [389370.885488]  <TASK>
> [389370.885636]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [389370.885983]  ? xas_descend+0x26/0x70
> [389370.886224]  xas_load+0x30/0x40
> [389370.886456]  filemap_get_read_batch+0x16e/0x250
> [389370.886758]  filemap_get_pages+0xa9/0x630
> [389370.887026]  ? atime_needs_update+0x104/0x180
> [389370.887317]  ? touch_atime+0x46/0x1f0
> [389370.887573]  filemap_read+0xd2/0x340
> [389370.887817]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [389370.888210]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [389370.888571]  vfs_read+0x23c/0x310
> [389370.888797]  __x64_sys_pread64+0x94/0xc0
> [389370.889058]  do_syscall_64+0x3a/0x90
> [389370.889299]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [389370.889634] RIP: 0033:0x7ff8de90e747
> [389370.889900] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [389370.891063] RSP: 002b:00007fff5206a848 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [389370.891549] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [389370.892003] RDX: 0000000000002000 RSI: 00007ff881c8ce80 RDI:
> 00000000000001c7
> [389370.892466] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [389370.892923] R10: 000000000f176000 R11: 0000000000000202 R12:
> 00007ff8cdd91278
> [389370.893386] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5e55828
> [389370.893843]  </TASK>
> 
> 
> At this point we have to restart the VM externally because it didn't respond
> properly any longer.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.


Liebe Grüße,
Christian Theune

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux