Re: extremely slow writes to array [now not degraded]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Johannes,

On 13/11/2023 17.54, Johannes Truschnigg wrote:
You will need to generate some kind of call stack/graph info to start making
sense of the mess your running kernel gets itself into in the scenario you
described.

I imagine that looking at `perf` (`perf top` in particular; it should suffice
to paste a few instances of its topmost lines when the involved kernel threads
are spinning) would yield some useful data to kick off a potentially
productive debate about what could actually be going on.

On Fedora, I think `sudo dnf install perf` should set up everything you need
to get you going; then, look at `sudo perf top -F 999` for a while, and let
the list know what you see. (You can use your terminal emulator's flow control
features, available via Ctrl+S to enable and Ctrl+Q to disable, to "freeze"
the output to make it easier to copy it into a clipboard buffer.)


top shows
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  42858 root      20   0       0      0      0 R 100.0   0.0   4:18.20 kworker/u16:3+flush-9:127

Here are two snapshots of
	sudo perf top -F 999 - p42858

=======================
Samples: 99K of event 'cycles:P', 999 Hz, Event count (approx.): 69262030071 lost: 0/0 drop: 0/0
Overhead  Shared O  Symbol
  34.31%  [kernel]  [k] ext4_mb_scan_aligned
  11.82%  [kernel]  [k] mb_find_order_for_block
   7.31%  [kernel]  [k] ext4_mb_good_group
   5.88%  [kernel]  [k] ext4_mb_load_buddy_gfp
   5.41%  [kernel]  [k] ext4_mb_regular_allocator
   4.84%  [kernel]  [k] xas_descend
   4.69%  [kernel]  [k] filemap_get_entry
   3.65%  [kernel]  [k] ext4_mb_unload_buddy
   3.32%  [kernel]  [k] xas_load
   2.95%  [kernel]  [k] mb_find_extent
   2.63%  [kernel]  [k] ext4_get_group_info
   1.77%  [kernel]  [k] _raw_spin_trylock
   1.66%  [kernel]  [k] __filemap_get_folio
   1.63%  [kernel]  [k] _raw_read_unlock
   1.59%  [kernel]  [k] _raw_read_lock
   1.13%  [kernel]  [k] __rcu_read_unlock
   1.06%  [kernel]  [k] __rcu_read_lock
   1.04%  [kernel]  [k] ext4_mb_find_good_group_avg_frag_lists
   0.76%  [kernel]  [k] xas_start
   0.68%  [kernel]  [k] pagecache_get_page
   0.47%  [kernel]  [k] _raw_spin_unlock
   0.47%  [kernel]  [k] folio_mark_accessed
   0.34%  [kernel]  [k] folio_test_hugetlb
   0.30%  [kernel]  [k] mb_find_buddy
   0.13%  [kernel]  [k] __cond_resched
   0.02%  [kernel]  [k] native_write_msr
   0.02%  [kernel]  [k] find_get_stripe
   0.01%  [kernel]  [k] native_irq_return_iret
   0.01%  [kernel]  [k] timekeeping_advance
   0.01%  [kernel]  [k] _raw_spin_lock
   0.01%  [kernel]  [k] native_read_msr
   0.01%  [kernel]  [k] update_process_times
   0.01%  [kernel]  [k] __update_load_avg_cfs_rq
   0.01%  [kernel]  [k] native_apic_msr_eoi_write
   0.01%  [kernel]  [k] hrtimer_interrupt
   0.01%  [kernel]  [k] _find_next_and_bit
   0.00%  [kernel]  [k] ext4_ext_correct_indexes
   0.00%  [kernel]  [k] scheduler_tick
   0.00%  [kernel]  [k] _raw_spin_lock_irq
   0.00%  [kernel]  [k] lapic_next_deadline
   0.00%  [kernel]  [k] ext4_es_lookup_extent
   0.00%  [kernel]  [k] mpage_process_page_bufs
   0.00%  [kernel]  [k] raid5_compute_blocknr
=======================

Samples: 342K of event 'cycles:P', 999 Hz, Event count (approx.): 68792587269 lost: 0/0 drop: 0/0
Overhead  Shared O  Symbol
  34.01%  [kernel]  [k] ext4_mb_scan_aligned
  11.91%  [kernel]  [k] mb_find_order_for_block
   6.56%  [kernel]  [k] ext4_mb_good_group
   6.19%  [kernel]  [k] ext4_mb_load_buddy_gfp
   5.30%  [kernel]  [k] ext4_mb_regular_allocator
   4.80%  [kernel]  [k] xas_descend
   4.77%  [kernel]  [k] filemap_get_entry
   3.72%  [kernel]  [k] ext4_mb_unload_buddy
   3.38%  [kernel]  [k] xas_load
   2.96%  [kernel]  [k] mb_find_extent
   2.71%  [kernel]  [k] ext4_get_group_info
   1.77%  [kernel]  [k] _raw_spin_trylock
   1.76%  [kernel]  [k] _raw_read_unlock
   1.71%  [kernel]  [k] __filemap_get_folio
   1.56%  [kernel]  [k] _raw_read_lock
   1.19%  [kernel]  [k] __rcu_read_unlock
   1.18%  [kernel]  [k] ext4_mb_find_good_group_avg_frag_lists
   1.04%  [kernel]  [k] __rcu_read_lock
   0.80%  [kernel]  [k] xas_start
   0.65%  [kernel]  [k] pagecache_get_page
   0.56%  [kernel]  [k] folio_mark_accessed
   0.48%  [kernel]  [k] _raw_spin_unlock
   0.34%  [kernel]  [k] mb_find_buddy
   0.30%  [kernel]  [k] folio_test_hugetlb
   0.18%  [kernel]  [k] __cond_resched
   0.02%  [kernel]  [k] perf_adjust_freq_unthr_context
   0.01%  [kernel]  [k] trigger_load_balance
   0.01%  [kernel]  [k] _raw_spin_lock
   0.01%  [kernel]  [k] timekeeping_advance
   0.01%  [kernel]  [k] update_fast_timekeeper
   0.01%  [kernel]  [k] account_system_index_time
   0.01%  [kernel]  [k] __accumulate_pelt_segments
   0.01%  [kernel]  [k] __hrtimer_next_event_base
   0.01%  [kernel]  [k] update_irq_load_avg
   0.01%  [kernel]  [k] errseq_check
   0.00%  [kernel]  [k] cpuacct_account_field
   0.00%  [kernel]  [k] __queue_work
   0.00%  [kernel]  [k] native_read_msr
   0.00%  [kernel]  [k] __run_timers
   0.00%  [kernel]  [k] find_get_stripe
   0.00%  [kernel]  [k] native_irq_return_iret
   0.00%  [kernel]  [k] hrtimer_active
   0.00%  [kernel]  [k] native_apic_msr_eoi_write
   0.00%  [kernel]  [k] xas_find_marked
   0.00%  [kernel]  [k] hrtimer_interrupt
   0.00%  [kernel]  [k] _raw_spin_lock_irqsave
   0.00%  [kernel]  [k] timekeeping_update
   0.00%  [kernel]  [k] irqtime_account_irq
   0.00%  [kernel]  [k] scheduler_tick
   0.00%  [kernel]  [k] pvclock_gtod_notify
   0.00%  [kernel]  [k] tick_sched_do_timer
   0.00%  [kernel]  [k] task_tick_fair
   0.00%  [kernel]  [k] native_write_msr
   0.00%  [kernel]  [k] __sysvec_apic_timer_interrupt
   0.00%  [kernel]  [k] update_sd_lb_stats.constprop.0
   0.00%  [kernel]  [k] asm_sysvec_apic_timer_interrupt
   0.00%  [kernel]  [k] update_curr
   0.00%  [kernel]  [k] _raw_spin_unlock_irqrestore
   0.00%  [kernel]  [k] ktime_get
   0.00%  [kernel]  [k] wq_worker_tick
   0.00%  [kernel]  [k] __radix_tree_lookup
   0.00%  [kernel]  [k] arch_scale_freq_tick
   0.00%  [kernel]  [k] native_sched_clock
   0.00%  [kernel]  [k] __update_load_avg_cfs_rq
   0.00%  [kernel]  [k] kthread_data
   0.00%  [kernel]  [k] raid5_get_active_stripe
   0.00%  [kernel]  [k] ext4_mb_new_blocks
   0.00%  [kernel]  [k] calc_global_load_tick
   0.00%  [kernel]  [k] irq_work_run_list
   0.00%  [kernel]  [k] _raw_spin_lock_irq
   0.00%  [kernel]  [k] lapic_next_deadline
   0.00%  [kernel]  [k] error_entry
   0.00%  [kernel]  [k] kmem_cache_alloc
   0.00%  [kernel]  [k] filemap_get_folios_tag
   0.00%  [kernel]  [k] intel_pmu_disable_all
   0.00%  [kernel]  [k] crypto_shash_update
   0.00%  [kernel]  [k] ext4_mb_use_preallocated.constprop.0
   0.00%  [kernel]  [k] raid5_compute_sector
   0.00%  [kernel]  [k] ext4_sb_block_valid
   0.00%  [kernel]  [k] read_tsc
   0.00%  [kernel]  [k] ext4_bio_write_folio
   0.00%  [kernel]  [k] ext4_fill_raw_inode
   0.00%  [kernel]  [k] crc32c_pcl_intel_update
   0.00%  [kernel]  [k] memset_orig
   0.00%  [kernel]  [k] start_this_handle
   0.00%  [kernel]  [k] update_load_avg
   0.00%  [kernel]  [k] bio_add_page
=======================

$ sudo sysctl -a|grep -i dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
=======================

$ cat /proc/meminfo
MemTotal:       32704840 kB
MemFree:          539904 kB
MemAvailable:   28070392 kB
Buffers:          280668 kB
Cached:         27201712 kB
SwapCached:           32 kB
Active:          4388732 kB
Inactive:       25812468 kB
Active(anon):    3118472 kB
Inactive(anon):    16500 kB
Active(file):    1270260 kB
Inactive(file): 25795968 kB
Unevictable:      174936 kB
Mlocked:               0 kB
SwapTotal:      16777212 kB
SwapFree:       16776956 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:            169872 kB
Writeback:             0 kB
AnonPages:       2893880 kB
Mapped:           958348 kB
Shmem:            416040 kB
KReclaimable:     803204 kB
Slab:            1052924 kB
SReclaimable:     803204 kB
SUnreclaim:       249720 kB
KernelStack:       23616 kB
PageTables:        43176 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    33129632 kB
Committed_AS:   11705468 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       69000 kB
VmallocChunk:          0 kB
Percpu:             6272 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      268428 kB
DirectMap2M:     7972864 kB
DirectMap1G:    25165824 kB

HTH

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux