6.2 series regression, transient stalls during write intensive workloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



kernel 6.2.14-300.fc38.x86_64

Summary: User with LSI MegaRAID SAS 2108 in raid5 with Btrfs is reporting transient (but reproducible) stalls about 10s at a time, during which time the system UI is unresponsive both CLI and GUI. PSI reports no memory or CPU pressure, but light IO pressure ~10%. Some blocked tasks are captured with sysrq+w. The IO scheduler is BFQ, changing to mq-deadline might relieve some of the stalls but is inconclusive because the problem still happens. The problem does not occur with kernel 6.1.18.

Downstream bug report:
regression - transient stalls during write-intensive workload (edit) 
https://bugzilla.redhat.com/show_bug.cgi?id=2193259

Attachment, output from "(cd /sys/kernel/debug/block/sda && find . -type f -exec grep -aH . {} \;)"
https://bugzilla-attachments.redhat.com/attachment.cgi?id=1962546

sysrq+w excerpt:

[21672.537148] sysrq: Show Blocked State
[21672.537553] task:worker          state:D stack:0     pid:9723  ppid:1      flags:0x00004002
[21672.537562] Call Trace:
[21672.537565]  <TASK>
[21672.537570]  __schedule+0x3cc/0x13e0
[21672.537585]  schedule+0x5d/0xe0
[21672.537590]  io_schedule+0x42/0x70
[21672.537595]  folio_wait_bit_common+0x13d/0x370
[21672.537605]  ? __pfx_wake_page_function+0x10/0x10
[21672.537611]  folio_wait_writeback+0x28/0x80
[21672.537618]  extent_write_cache_pages+0x3ac/0x540
[21672.537628]  extent_writepages+0x5a/0x120
[21672.537633]  ? __pfx_end_bio_extent_writepage+0x10/0x10
[21672.537637]  do_writepages+0xc0/0x1d0
[21672.537644]  ? hrtimer_cancel+0x11/0x20
[21672.537650]  filemap_fdatawrite_wbc+0x5f/0x80
[21672.537654]  __filemap_fdatawrite_range+0x58/0x80
[21672.537662]  start_ordered_ops.constprop.0+0x73/0xd0
[21672.537668]  btrfs_sync_file+0xbc/0x580
[21672.537672]  ? __seccomp_filter+0x333/0x500
[21672.537681]  __x64_sys_fdatasync+0x4b/0x90
[21672.537689]  do_syscall_64+0x5c/0x90
[21672.537696]  ? syscall_exit_to_user_mode+0x17/0x40
[21672.537701]  ? do_syscall_64+0x68/0x90
[21672.537704]  ? syscall_exit_to_user_mode+0x17/0x40
[21672.537708]  ? do_syscall_64+0x68/0x90
[21672.537712]  ? syscall_exit_to_user_mode+0x17/0x40
[21672.537715]  ? do_syscall_64+0x68/0x90
[21672.537718]  ? do_syscall_64+0x68/0x90
[21672.537722]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[21672.537729] RIP: 0033:0x7f899579ae9c
[21672.537749] RSP: 002b:00007f78a57f9650 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[21672.537754] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f899579ae9c
[21672.537757] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000010
[21672.537759] RBP: 00007f78a57f9660 R08: 0000000000000000 R09: 0000000000000001
[21672.537761] R10: 0000000000000000 R11: 0000000000000293 R12: 000055d1291be230
[21672.537764] R13: 000055d128f6f1b0 R14: 000055d1287843d8 R15: 000055d128f6f218
[21672.537771]  </TASK>
[21699.524273] sysrq: Show Blocked State
[21715.167298] sysrq: Show Blocked State
[21715.167633] task:worker          state:D stack:0     pid:9723  ppid:1      flags:0x00004002
[21715.167639] Call Trace:
[21715.167642]  <TASK>
[21715.167647]  __schedule+0x3cc/0x13e0
[21715.167657]  ? __schedule+0x3d4/0x13e0
[21715.167662]  schedule+0x5d/0xe0
[21715.167665]  io_schedule+0x42/0x70
[21715.167668]  folio_wait_bit_common+0x13d/0x370
[21715.167676]  ? __pfx_wake_page_function+0x10/0x10
[21715.167680]  folio_wait_writeback+0x28/0x80
[21715.167685]  extent_write_cache_pages+0x3ac/0x540
[21715.167692]  extent_writepages+0x5a/0x120
[21715.167694]  ? ttwu_queue_wakelist+0xbf/0x110
[21715.167700]  ? __pfx_end_bio_extent_writepage+0x10/0x10
[21715.167703]  do_writepages+0xc0/0x1d0
[21715.167707]  ? __remove_hrtimer+0x39/0x90
[21715.167711]  filemap_fdatawrite_wbc+0x5f/0x80
[21715.167714]  __filemap_fdatawrite_range+0x58/0x80
[21715.167719]  start_ordered_ops.constprop.0+0x73/0xd0
[21715.167723]  btrfs_sync_file+0xbc/0x580
[21715.167726]  ? __seccomp_filter+0x333/0x500
[21715.167732]  __x64_sys_fdatasync+0x4b/0x90
[21715.167738]  do_syscall_64+0x5c/0x90
[21715.167743]  ? syscall_exit_to_user_mode+0x17/0x40
[21715.167747]  ? do_syscall_64+0x68/0x90
[21715.167749]  ? syscall_exit_to_user_mode+0x17/0x40
[21715.167751]  ? do_syscall_64+0x68/0x90
[21715.167753]  ? syscall_exit_to_user_mode+0x17/0x40
[21715.167755]  ? do_syscall_64+0x68/0x90
[21715.167757]  ? do_syscall_64+0x68/0x90
[21715.167760]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[21715.167765] RIP: 0033:0x7f899579ae9c
[21715.167782] RSP: 002b:00007f78a57f9650 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[21715.167785] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f899579ae9c
[21715.167787] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000010
[21715.167789] RBP: 00007f78a57f9660 R08: 0000000000000000 R09: 0000000000000001
[21715.167790] R10: 0000000000000000 R11: 0000000000000293 R12: 000055d1291be230
[21715.167792] R13: 000055d128f6f1b0 R14: 000055d1287843d8 R15: 000055d128f6f218
[21715.167797]  </TASK>
[21715.167822] task:kworker/u48:5   state:D stack:0     pid:9080  ppid:2      flags:0x00004000
[21715.167826] Workqueue: writeback wb_workfn (flush-btrfs-1)
[21715.167830] Call Trace:
[21715.167832]  <TASK>
[21715.167833]  __schedule+0x3cc/0x13e0
[21715.167839]  schedule+0x5d/0xe0
[21715.167842]  io_schedule+0x42/0x70
[21715.167845]  blk_mq_get_tag+0x11a/0x2a0
[21715.167850]  ? __pfx_autoremove_wake_function+0x10/0x10
[21715.167855]  __blk_mq_alloc_requests+0x18e/0x2e0
[21715.167861]  blk_mq_submit_bio+0x2fb/0x5f0
[21715.167867]  __submit_bio+0xf5/0x180
[21715.167872]  submit_bio_noacct_nocheck+0x32e/0x370
[21715.167876]  ? submit_bio_noacct+0x71/0x4e0
[21715.167880]  btrfs_submit_bio+0x247/0x270
[21715.167887]  submit_one_bio+0xfa/0x120
[21715.167893]  submit_extent_page+0x15f/0x500
[21715.167898]  ? __pfx_page_mkclean_one+0x10/0x10
[21715.167905]  ? __pfx_invalid_mkclean_vma+0x10/0x10
[21715.167909]  __extent_writepage_io.constprop.0+0x273/0x590
[21715.167914]  ? writepage_delalloc+0x20/0x180
[21715.167918]  __extent_writepage+0x124/0x440
[21715.167922]  extent_write_cache_pages+0x15f/0x540
[21715.167927]  extent_writepages+0x5a/0x120
[21715.167930]  ? __pfx_end_bio_extent_writepage+0x10/0x10
[21715.167933]  do_writepages+0xc0/0x1d0
[21715.167936]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[21715.167942]  __writeback_single_inode+0x3d/0x360
[21715.167945]  ? _raw_spin_lock+0x13/0x40
[21715.167950]  writeback_sb_inodes+0x1ed/0x4b0
[21715.167955]  __writeback_inodes_wb+0x4c/0xf0
[21715.167958]  wb_writeback+0x172/0x2f0
[21715.167962]  wb_workfn+0x357/0x510
[21715.167964]  ? __schedule+0x3d4/0x13e0
[21715.167969]  process_one_work+0x1c7/0x3d0
[21715.167974]  worker_thread+0x4d/0x380
[21715.167978]  ? __pfx_worker_thread+0x10/0x10
[21715.167981]  kthread+0xe9/0x110
[21715.167985]  ? __pfx_kthread+0x10/0x10
[21715.167988]  ret_from_fork+0x2c/0x50
[21715.167996]  </TASK>

full dmesg attachment
https://bugzilla-attachments.redhat.com/attachment.cgi?id=1962385


Thanks,



--
Chris Murphy



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux