Re: [Bug 217572] New: Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Argh. Sorry. My brain is a bit fried today and I - for some reason - thought I did the bug reports last week and not yesterday. I didn’t want to appear pushy, so, sorry for the early follow-up.

*ducks*

> On 20. Jun 2023, at 17:10, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> is there anything I can do to raise interest on this one? :)
> 
>> On 19. Jun 2023, at 10:29, bugzilla-daemon@xxxxxxxxxx wrote:
>> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=217572
>> 
>>           Bug ID: 217572
>>          Summary: Initial blocked tasks causing deterioration over hours
>>                   until (nearly) complete system lockup and data loss
>>                   with PostgreSQL 13
>>          Product: File System
>>          Version: 2.5
>>         Hardware: All
>>               OS: Linux
>>           Status: NEW
>>         Severity: high
>>         Priority: P3
>>        Component: XFS
>>         Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx
>>         Reporter: ct@xxxxxxxxxxxxxxx
>>       Regression: No
>> 
>> Last Friday we experienced the following hung task messages with PostgreSQL
>> while performing our nightly backup using pg_dump. Normally this takes at most
>> a few minutes with IO being stressed. This time it caused high SYS CPU time,
>> went on for almost 20 minutes and caused the PostgreSQL dump to fail with
>> inconsistent data.
>> 
>> Around 3:50 AM we got this:
>> 
>> [330289.821046] INFO: task .postgres-wrapp:11884 blocked for more than 122
>> seconds.
>> [330289.821830]       Not tainted 6.1.31 #1-NixOS
>> [330289.822285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [330289.823098] task:.postgres-wrapp state:D stack:0     pid:11884 ppid:11881 
>> flags:0x00000002
>> [330289.823911] Call Trace:
>> [330289.824221]  <TASK>
>> [330289.824451]  __schedule+0x35d/0x1370
>> [330289.824858]  ? mntput_no_expire+0x4a/0x250
>> [330289.825307]  schedule+0x5d/0xe0
>> [330289.825630]  rwsem_down_write_slowpath+0x34e/0x730
>> [330289.826128]  xfs_ilock+0xeb/0xf0 [xfs]
>> [330289.826599]  xfs_file_buffered_write+0x119/0x300 [xfs]
>> [330289.827212]  ? selinux_file_permission+0x10b/0x150
>> [330289.827683]  vfs_write+0x244/0x400
>> [330289.828049]  __x64_sys_pwrite64+0x94/0xc0
>> [330289.828459]  do_syscall_64+0x3a/0x90
>> [330289.828801]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [330289.829302] RIP: 0033:0x7ff8de90e7f7
>> [330289.829681] RSP: 002b:00007fff52069b08 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000012
>> [330289.830408] RAX: ffffffffffffffda RBX: 0000562bb434f510 RCX:
>> 00007ff8de90e7f7
>> [330289.831073] RDX: 0000000000002000 RSI: 00007ff888b52e80 RDI:
>> 000000000000003b
>> [330289.831762] RBP: 00000000000021b0 R08: 000000000a000010 R09:
>> 0000000000000040
>> [330289.832440] R10: 0000000004ed8000 R11: 0000000000000202 R12:
>> 0000000000002000
>> [330289.833130] R13: 0000000004ed8000 R14: 00007ff8de8176c8 R15:
>> 0000562bb434af75
>> [330289.833803]  </TASK>
>> [330289.834064] INFO: task .postgres-wrapp:1245532 blocked for more than 122
>> seconds.
>> [330289.834771]       Not tainted 6.1.31 #1-NixOS
>> [330289.835209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [330289.835926] task:.postgres-wrapp state:D stack:0     pid:1245532 ppid:11881
>> flags:0x00000002
>> [330289.836752] Call Trace:
>> [330289.837010]  <TASK>
>> [330289.837258]  __schedule+0x35d/0x1370
>> [330289.837622]  ? page_add_file_rmap+0xba/0x2f0
>> [330289.838047]  ? do_set_pte+0x174/0x1c0
>> [330289.838420]  ? unix_stream_read_generic+0x223/0xa60
>> [330289.838887]  schedule+0x5d/0xe0
>> [330289.839265]  schedule_preempt_disabled+0x14/0x30
>> [330289.839758]  rwsem_down_read_slowpath+0x29e/0x4f0
>> [330289.840521]  down_read+0x47/0xb0
>> [330289.840853]  xfs_ilock+0x79/0xf0 [xfs]
>> [330289.841346]  xfs_file_buffered_read+0x44/0xd0 [xfs]
>> [330289.841945]  xfs_file_read_iter+0x6a/0xd0 [xfs]
>> [330289.842496]  vfs_read+0x23c/0x310
>> [330289.842845]  __x64_sys_pread64+0x94/0xc0
>> [330289.843303]  do_syscall_64+0x3a/0x90
>> [330289.843857]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [330289.844420] RIP: 0033:0x7ff8de90e747
>> [330289.844898] RSP: 002b:00007fff5206aae8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000011
>> [330289.845705] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>> 00007ff8de90e747
>> [330289.846488] RDX: 0000000000002000 RSI: 00007ff8896e0e80 RDI:
>> 0000000000000010
>> [330289.847133] RBP: 0000000000000001 R08: 000000000a00000d R09:
>> 0000000000000000
>> [330289.847627] R10: 0000000018fb8000 R11: 0000000000000202 R12:
>> 00007ff8cdd91278
>> [330289.848113] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
>> 0000562bb5e89e70
>> [330289.848631]  </TASK>
>> 
>> 
>> This keeps going on for a few minutes until:
>> 
>> [330584.618978] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [330584.619413] rcu:    1-....: (20999 ticks this GP)
>> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=5231
>> [330584.620052]         (t=21000 jiffies g=142859597 q=21168 ncpus=3)
>> [330584.620409] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
>> #1-NixOS
>> [330584.620880] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [330584.621588] RIP: 0010:xas_load+0xb/0x40
>> [330584.621844] Code: 04 48 8b 44 c2 08 c3 cc cc cc cc 48 8b 07 48 8b 40 08 c3
>> cc cc cc cc 0f 1f 84 00 00 00 00 00 e8 3b ff ff ff 48 89 c2 83 e2 03 <48> 83 fa
>> 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f
>> [330584.622996] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000202
>> [330584.623337] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
>> 0000000000000002
>> [330584.623787] RDX: 0000000000000002 RSI: ffff98871f9d2920 RDI:
>> ffffb427c3387c00
>> [330584.624239] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
>> ffffda1444482000
>> [330584.624686] R10: 0000000000000001 R11: 0000000000000001 R12:
>> 000000000000f161
>> [330584.625136] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
>> ffffb427c3387e90
>> [330584.625591] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
>> knlGS:0000000000000000
>> [330584.626097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [330584.626467] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
>> 00000000000006e0
>> [330584.626954] Call Trace:
>> [330584.627125]  <IRQ>
>> [330584.627281]  ? rcu_dump_cpu_stacks+0xc8/0x100
>> [330584.627567]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
>> [330584.627882]  ? sched_slice+0x87/0x140
>> [330584.628126]  ? timekeeping_update+0xdd/0x130
>> [330584.628414]  ? __cgroup_account_cputime_field+0x5b/0xa0
>> [330584.628751]  ? update_process_times+0x77/0xb0
>> [330584.629036]  ? update_wall_time+0xc/0x20
>> [330584.629300]  ? tick_sched_handle+0x34/0x50
>> [330584.629564]  ? tick_sched_timer+0x6f/0x80
>> [330584.629823]  ? tick_sched_do_timer+0xa0/0xa0
>> [330584.630103]  ? __hrtimer_run_queues+0x112/0x2b0
>> [330584.630404]  ? hrtimer_interrupt+0xfe/0x220
>> [330584.630678]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
>> [330584.631016]  ? sysvec_apic_timer_interrupt+0x99/0xc0
>> [330584.631339]  </IRQ>
>> [330584.631485]  <TASK>
>> [330584.631631]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>> [330584.631974]  ? xas_load+0xb/0x40
>> [330584.632195]  ? xas_load+0x30/0x40
>> [330584.632415]  filemap_get_read_batch+0x16e/0x250
>> [330584.632710]  filemap_get_pages+0xa9/0x630
>> [330584.632972]  ? memcg_check_events+0xda/0x210
>> [330584.633259]  ? free_unref_page_commit+0x7c/0x170
>> [330584.633560]  ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [330584.633880]  ? free_unref_page+0x1ac/0x220
>> [330584.634146]  filemap_read+0xd2/0x340
>> [330584.634389]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
>> [330584.634778]  xfs_file_read_iter+0x6a/0xd0 [xfs]
>> [330584.635123]  vfs_read+0x23c/0x310
>> [330584.635354]  __x64_sys_pread64+0x94/0xc0
>> [330584.635609]  do_syscall_64+0x3a/0x90
>> [330584.635846]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [330584.636174] RIP: 0033:0x7ff8de90e747
>> [330584.636437] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90
>> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
>> [330584.637582] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000011
>> [330584.638056] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>> 00007ff8de90e747
>> [330584.638506] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
>> 0000000000000230
>> [330584.638954] RBP: 0000000000000001 R08: 000000000a00000d R09:
>> 0000000000000000
>> [330584.639411] R10: 000000000f160000 R11: 0000000000000202 R12:
>> 00007ff8cdd92688
>> [330584.639862] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
>> 0000562bb5fe04d0
>> [330584.640313]  </TASK>
>> [330584.721995] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks:
>> { 1-.... } 21015 jiffies s: 7297 root: 0x2/.
>> [330584.722930] rcu: blocking rcu_node structures (internal RCU debug):
>> [330584.723428] Sending NMI from CPU 0 to CPUs 1:
>> [330584.723763] NMI backtrace for cpu 1
>> [330584.723769] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
>> #1-NixOS
>> [330584.723772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [330584.723775] RIP: 0010:xas_load+0x29/0x40
>> [330584.723783] Code: 00 e8 3b ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08 48
>> 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f 10 48 8d 70 fe 38 48 fe <72> ee e8
>> 20 fe ff ff 80 3e 00 75 d0 c3 cc cc cc cc 66 0f 1f 44 00
>> [330584.723785] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000246
>> [330584.723787] RAX: ffff98871f8dbdaa RBX: ffffb427c3387d70 RCX:
>> 0000000000000000
>> [330584.723788] RDX: 0000000000000002 RSI: ffff98871f8dbda8 RDI:
>> ffffb427c3387c00
>> [330584.723789] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
>> ffffda1444482000
>> [330584.723790] R10: 0000000000000001 R11: 0000000000000001 R12:
>> 000000000000f161
>> [330584.723791] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
>> ffffb427c3387e90
>> [330584.723793] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
>> knlGS:0000000000000000
>> [330584.723794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [330584.723795] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
>> 00000000000006e0
>> [330584.723798] Call Trace:
>> [330584.723801]  <NMI>
>> [330584.723805]  ? nmi_cpu_backtrace.cold+0x1b/0x76
>> [330584.723810]  ? nmi_cpu_backtrace_handler+0xd/0x20
>> [330584.723814]  ? nmi_handle+0x5d/0x120
>> [330584.723817]  ? xas_load+0x29/0x40
>> [330584.723818]  ? default_do_nmi+0x69/0x170
>> [330584.723821]  ? exc_nmi+0x13c/0x170
>> [330584.723823]  ? end_repeat_nmi+0x16/0x67
>> [330584.723828]  ? xas_load+0x29/0x40
>> [330584.723830]  ? xas_load+0x29/0x40
>> [330584.723832]  ? xas_load+0x29/0x40
>> [330584.723834]  </NMI>
>> [330584.723834]  <TASK>
>> [330584.723835]  filemap_get_read_batch+0x16e/0x250
>> [330584.723840]  filemap_get_pages+0xa9/0x630
>> [330584.723842]  ? memcg_check_events+0xda/0x210
>> [330584.723845]  ? free_unref_page_commit+0x7c/0x170
>> [330584.723849]  ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [330584.723851]  ? free_unref_page+0x1ac/0x220
>> [330584.723852]  filemap_read+0xd2/0x340
>> [330584.723857]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
>> [330584.723935]  xfs_file_read_iter+0x6a/0xd0 [xfs]
>> [330584.723990]  vfs_read+0x23c/0x310
>> [330584.723995]  __x64_sys_pread64+0x94/0xc0
>> [330584.723997]  do_syscall_64+0x3a/0x90
>> [330584.724000]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [330584.724003] RIP: 0033:0x7ff8de90e747
>> [330584.724019] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90
>> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
>> [330584.724020] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000011
>> [330584.724021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>> 00007ff8de90e747
>> [330584.724022] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
>> 0000000000000230
>> [330584.724023] RBP: 0000000000000001 R08: 000000000a00000d R09:
>> 0000000000000000
>> [330584.724023] R10: 000000000f160000 R11: 0000000000000202 R12:
>> 00007ff8cdd92688
>> [330584.724024] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
>> 0000562bb5fe04d0
>> [330584.724026]  </TASK>
>> 
>> 
>> This keeps repeating until around 4:03 which ends in systemd-journal getting
>> coredumped:
>> 
>> [331277.846966] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [331277.847413] rcu:    1-....: (713858 ticks this GP)
>> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=311996
>> [331277.848088]         (t=714253 jiffies g=142859597 q=1821602 ncpus=3)
>> [331277.848462] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
>> #1-NixOS
>> [331277.848941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [331277.849667] RIP: 0010:xas_descend+0x22/0x70
>> [331277.849952] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83
>> e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83 f9
>> 02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc
>> [331277.851120] RSP: 0018:ffffb427c3387bf0 EFLAGS: 00000202
>> [331277.851468] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
>> 0000000000000002
>> [331277.851928] RDX: 000000000000000f RSI: ffff98871f9d2920 RDI:
>> ffffb427c3387c00
>> [331277.852386] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
>> ffffda1444482000
>> [331277.852847] R10: 0000000000000001 R11: 0000000000000001 R12:
>> 000000000000f161
>> [331277.853303] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
>> ffffb427c3387e90
>> [331277.853766] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
>> knlGS:0000000000000000
>> [331277.854278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [331277.854653] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
>> 00000000000006e0
>> [331277.855111] Call Trace:
>> [331277.855284]  <IRQ>
>> [331277.855433]  ? rcu_dump_cpu_stacks+0xc8/0x100
>> [331277.855730]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
>> [331277.856049]  ? sched_slice+0x87/0x140
>> [331277.856294]  ? timekeeping_update+0xdd/0x130
>> [331277.856587]  ? __cgroup_account_cputime_field+0x5b/0xa0
>> [331277.856930]  ? update_process_times+0x77/0xb0
>> [331277.857222]  ? update_wall_time+0xc/0x20
>> [331277.857482]  ? tick_sched_handle+0x34/0x50
>> [331277.857758]  ? tick_sched_timer+0x6f/0x80
>> [331277.858024]  ? tick_sched_do_timer+0xa0/0xa0
>> [331277.858306]  ? __hrtimer_run_queues+0x112/0x2b0
>> [331277.858613]  ? hrtimer_interrupt+0xfe/0x220
>> [331277.858891]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
>> [331277.859235]  ? sysvec_apic_timer_interrupt+0x99/0xc0
>> [331277.859558]  </IRQ>
>> [331277.859710]  <TASK>
>> [331277.859857]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>> [331277.860206]  ? xas_descend+0x22/0x70
>> [331277.860446]  xas_load+0x30/0x40
>> [331277.860665]  filemap_get_read_batch+0x16e/0x250
>> [331277.860967]  filemap_get_pages+0xa9/0x630
>> [331277.861233]  ? memcg_check_events+0xda/0x210
>> [331277.861517]  ? free_unref_page_commit+0x7c/0x170
>> [331277.861834]  ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [331277.862158]  ? free_unref_page+0x1ac/0x220
>> [331277.862427]  filemap_read+0xd2/0x340
>> [331277.862677]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
>> [331277.863072]  xfs_file_read_iter+0x6a/0xd0 [xfs]
>> [331277.863424]  vfs_read+0x23c/0x310
>> [331277.863657]  __x64_sys_pread64+0x94/0xc0
>> [331277.863917]  do_syscall_64+0x3a/0x90
>> [331277.864159]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [331277.864488] RIP: 0033:0x7ff8de90e747
>> [331277.864752] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90
>> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
>> [331277.865917] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000011
>> [331277.866395] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>> 00007ff8de90e747
>> [331277.866848] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
>> 0000000000000230
>> [331277.867298] RBP: 0000000000000001 R08: 000000000a00000d R09:
>> 0000000000000000
>> [331277.867750] R10: 000000000f160000 R11: 0000000000000202 R12:
>> 00007ff8cdd92688
>> [331277.868199] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
>> 0000562bb5fe04d0
>> [331277.868658]  </TASK>
>> Fri Jun 16 04:03:23 AM CEST 2023 -- SERIAL CONSOLE IS LIVE --
>> [331289.987247] systemd[1]: Starting Serial console liveness marker...
>> [331289.992645] systemd[1]: nscd.service: Deactivated successfully.
>> [331290.011838] systemd[1]: nscd.service: Consumed 3min 16.251s CPU time,
>> received 12.0M IP traffic, sent 8.2M IP traffic.
>> [331290.020289] systemd[1]: serial-console-liveness.service: Deactivated
>> successfully.
>> [331290.035818] systemd[1]: Finished Serial console liveness marker.
>> [331290.068776] systemd[1]: Started Logrotate Service.
>> [331290.069700] systemd[1]: systemd-journald.service: Watchdog timeout (limit
>> 3min)!
>> [331290.070475] systemd[1]: systemd-journald.service: Killing process 528
>> (systemd-journal) with signal SIGABRT.
>> [331290.096246] systemd[1]: logrotate.service: Deactivated successfully.
>> [331290.285285] systemd-coredump[1267441]: Process 528 (systemd-journal) of
>> user 0 dumped core.
>> [331290.286002] systemd-coredump[1267441]: Coredump diverted to
>> /var/lib/systemd/coredump/core.systemd-journal.0.39df468c96764e8c8fffc53a0d0a47d1.528.1686881003000000.zst
>> [331290.287045] systemd-coredump[1267441]: Module libaudit.so.1 without
>> build-id.
>> [331290.287574] systemd-coredump[1267441]: Module libnftnl.so.11 without
>> build-id.
>> [331290.288111] systemd-coredump[1267441]: Module libmnl.so.0 without build-id.
>> [331290.288587] systemd-coredump[1267441]: Module libgpg-error.so.0 without
>> build-id.
>> [331290.289076] systemd-coredump[1267441]: Module libattr.so.1 without
>> build-id.
>> [331290.289541] systemd-coredump[1267441]: Module libzstd.so.1 without
>> build-id.
>> [331290.289991] systemd-coredump[1267441]: Module liblzma.so.5 without
>> build-id.
>> [331290.290446] systemd-coredump[1267441]: Module libseccomp.so.2 without
>> build-id.
>> [331290.290940] systemd-coredump[1267441]: Module libpam.so.0 without build-id.
>> [331290.291419] systemd-coredump[1267441]: Module liblz4.so.1 without build-id.
>> [331290.291893] systemd-coredump[1267441]: Module libkmod.so.2 without
>> build-id.
>> [331290.292370] systemd-coredump[1267441]: Module libip4tc.so.2 without
>> build-id.
>> [331290.295165] systemd-coredump[1267441]: Module libgcrypt.so.20 without
>> build-id.
>> [331290.295678] systemd-coredump[1267441]: Module libcrypt.so.2 without
>> build-id.
>> [331290.296160] systemd-coredump[1267441]: Module libcap.so.2 without build-id.
>> [331290.296708] systemd-coredump[1267441]: Module libacl.so.1 without build-id.
>> [331290.297206] systemd-coredump[1267441]: Module libsystemd-shared-253.so
>> without build-id.
>> [331290.297768] systemd-coredump[1267441]: Module systemd-journald without
>> build-id.
>> [331290.298273] systemd-coredump[1267441]: Stack trace of thread 528:
>> [331290.298714] systemd-coredump[1267441]: #0  0x00007f3e96431de8
>> check_object_header (libsystemd-shared-253.so + 0x231de8)
>> [331290.299435] systemd-coredump[1267441]: #1  0x00007f3e964330d2
>> journal_file_move_to_object (libsystemd-shared-253.so + 0x2330d2)
>> [331290.300199] systemd-coredump[1267441]: #2  0x00007f3e96434a71
>> journal_file_find_data_object_with_hash (libsystemd-shared-253.so + 0x234a71)
>> [331290.302207] systemd-coredump[1267441]: #3  0x00007f3e96434d3b
>> journal_file_append_data (libsystemd-shared-253.so + 0x234d3b)
>> [331290.302983] systemd-coredump[1267441]: #4  0x00007f3e96437243
>> journal_file_append_entry (libsystemd-shared-253.so + 0x237243)
>> [331290.303767] systemd-coredump[1267441]: #5  0x000056536a4a7cd5
>> server_dispatch_message_real (systemd-journald + 0x10cd5)
>> [331290.304504] systemd-coredump[1267441]: #6  0x000056536a4a134b
>> dev_kmsg_record (systemd-journald + 0xa34b)
>> [331290.305195] systemd-coredump[1267441]: #7  0x000056536a4a182b
>> server_read_dev_kmsg (systemd-journald + 0xa82b)
>> [331290.305922] systemd-coredump[1267441]: #8  0x00007f3e9645c140
>> source_dispatch (libsystemd-shared-253.so + 0x25c140)
>> [331290.306640] systemd-coredump[1267441]: #9  0x00007f3e9645c42d
>> sd_event_dispatch (libsystemd-shared-253.so + 0x25c42d)
>> [331290.307350] systemd-coredump[1267441]: #10 0x00007f3e9645cb48 sd_event_run
>> (libsystemd-shared-253.so + 0x25cb48)
>> [331290.308043] systemd-coredump[1267441]: #11 0x000056536a4a0568 main
>> (systemd-journald + 0x9568)
>> [331290.308652] systemd-coredump[1267441]: #12 0x00007f3e9603dace
>> __libc_start_call_main (libc.so.6 + 0x23ace)
>> [331290.309292] systemd-coredump[1267441]: #13 0x00007f3e9603db89
>> __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
>> [331290.310048] systemd-coredump[1267441]: #14 0x000056536a4a0835 _start
>> (systemd-journald + 0x9835)
>> [331290.310692] systemd-coredump[1267441]: ELF object binary architecture: AMD
>> x86-64
>> 
>> 
>> This go back to normal until around 20:03 (around 14 hours later):
>> 
>> [389370.873026] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [389370.873481] rcu:    0-....: (21000 ticks this GP)
>> idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704
>> [389370.874126]         (t=21002 jiffies g=167843445 q=13889 ncpus=3)
>> [389370.874500] CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31
>> #1-NixOS
>> [389370.874979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [389370.875710] RIP: 0010:xas_descend+0x26/0x70
>> [389370.875991] Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0
>> 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08 48
>> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02
>> [389370.877164] RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246
>> [389370.877512] RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX:
>> 0000000000000002
>> [389370.877967] RDX: 0000000000000005 RSI: ffff988876d3c000 RDI:
>> ffffb427c4917c00
>> [389370.878426] RBP: 000000000000f177 R08: ffffb427c4917e68 R09:
>> ffff988846485d38
>> [389370.878880] R10: ffffb427c4917e60 R11: ffff988846485d38 R12:
>> 000000000000f177
>> [389370.879341] R13: ffff988827b4ae00 R14: 000000000000f176 R15:
>> ffffb427c4917e90
>> [389370.879798] FS:  00007ff8de817800(0000) GS:ffff98887ac00000(0000)
>> knlGS:0000000000000000
>> [389370.880308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [389370.880683] CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4:
>> 00000000000006f0
>> [389370.881139] Call Trace:
>> [389370.881309]  <IRQ>
>> [389370.881463]  ? rcu_dump_cpu_stacks+0xc8/0x100
>> [389370.881755]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
>> [389370.882077]  ? sched_slice+0x87/0x140
>> [389370.882325]  ? perf_event_task_tick+0x64/0x370
>> [389370.882629]  ? __cgroup_account_cputime_field+0x5b/0xa0
>> [389370.882968]  ? update_process_times+0x77/0xb0
>> [389370.883258]  ? tick_sched_handle+0x34/0x50
>> [389370.883538]  ? tick_sched_timer+0x6f/0x80
>> [389370.883802]  ? tick_sched_do_timer+0xa0/0xa0
>> [389370.884084]  ? __hrtimer_run_queues+0x112/0x2b0
>> [389370.884388]  ? hrtimer_interrupt+0xfe/0x220
>> [389370.884666]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
>> [389370.885006]  ? sysvec_apic_timer_interrupt+0x99/0xc0
>> [389370.885338]  </IRQ>
>> [389370.885488]  <TASK>
>> [389370.885636]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>> [389370.885983]  ? xas_descend+0x26/0x70
>> [389370.886224]  xas_load+0x30/0x40
>> [389370.886456]  filemap_get_read_batch+0x16e/0x250
>> [389370.886758]  filemap_get_pages+0xa9/0x630
>> [389370.887026]  ? atime_needs_update+0x104/0x180
>> [389370.887317]  ? touch_atime+0x46/0x1f0
>> [389370.887573]  filemap_read+0xd2/0x340
>> [389370.887817]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
>> [389370.888210]  xfs_file_read_iter+0x6a/0xd0 [xfs]
>> [389370.888571]  vfs_read+0x23c/0x310
>> [389370.888797]  __x64_sys_pread64+0x94/0xc0
>> [389370.889058]  do_syscall_64+0x3a/0x90
>> [389370.889299]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [389370.889634] RIP: 0033:0x7ff8de90e747
>> [389370.889900] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90
>> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
>> [389370.891063] RSP: 002b:00007fff5206a848 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000011
>> [389370.891549] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
>> 00007ff8de90e747
>> [389370.892003] RDX: 0000000000002000 RSI: 00007ff881c8ce80 RDI:
>> 00000000000001c7
>> [389370.892466] RBP: 0000000000000001 R08: 000000000a00000d R09:
>> 0000000000000000
>> [389370.892923] R10: 000000000f176000 R11: 0000000000000202 R12:
>> 00007ff8cdd91278
>> [389370.893386] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
>> 0000562bb5e55828
>> [389370.893843]  </TASK>
>> 
>> 
>> At this point we have to restart the VM externally because it didn't respond
>> properly any longer.
>> 
>> -- 
>> You may reply to this email to add a comment.
>> 
>> You are receiving this mail because:
>> You are watching the assignee of the bug.
> 
> 
> Liebe Grüße,
> Christian Theune
> 
> -- 
> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux