Re: bcache create hundred of kernel threads - is this expected?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/25/21 3:31 PM, Kai Krakow wrote:
> Hi!
> 
> When looking at htop, I see hundreds of kernel threads dedicated to
> bcache workers. Many of them go in and out of D state at the same time
> which may explain the spikes in load I'm seeing especially with
> writeback enabled (easily surpassing 200 loadavg).
> 
> From some older observations, the system feels smoother when turning
> bcache off (cache_mode = none) but obviously many file system
> operations are much slower. With bcache enabled, I'm seeing
> spontaneous desktop lags every once in a while.
> 
> My bcache setup runs mainly on btrfs, with one additional xfs
> partition where I've put some background database workload which isn't
> exactly friends with btrfs.
> 
> I'm wondering if this is expected. No other block-related driver seems
> to be such a heavy user of kernel threads:
> ```
> # uname -a
> Linux jupiter 5.10.9-gentoo #3 SMP PREEMPT Sun Jan 24 22:43:57 CET
> 2021 x86_64 Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz GenuineIntel
> GNU/Linux
> 
> # ps faux | grep bcache | wc -l
> 284
> 
> # lsblk
> NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda           8:0    0 931,5G  0 disk
> ├─sda1        8:1    0     2G  0 part /efi
> ├─sda2        8:2    0 865,5G  0 part
> │ ├─bcache0 254:0    0   2,7T  0 disk # pooled with bcache4
> │ ├─bcache1 254:128  0   2,7T  0 disk # pooled with bcache4
> │ ├─bcache2 254:256  0 925,5G  0 disk # pooled with bcache4
> │ ├─bcache3 254:384  0 931,5G  0 disk /mnt/xfs-storage
> │ └─bcache4 254:512  0 925,5G  0 disk /mnt/btrfs-pool
> └─sda3        8:3    0    64G  0 part [SWAP]
> sdb           8:16   0   2,7T  0 disk
> ├─sdb1        8:17   0    32G  0 part [SWAP]
> └─sdb2        8:18   0   2,7T  0 part
>  └─bcache0 254:0    0   2,7T  0 disk
> sdc           8:32   0 931,5G  0 disk
> ├─sdc1        8:33   0     6G  0 part [SWAP]
> └─sdc2        8:34   0 925,5G  0 part
>  └─bcache4 254:512  0 925,5G  0 disk /mnt/btrfs-pool
> sdd           8:48   0 931,5G  0 disk
> ├─sdd1        8:49   0     6G  0 part [SWAP]
> └─sdd2        8:50   0 925,5G  0 part
>  └─bcache2 254:256  0 925,5G  0 disk
> sde           8:64   0   3,7T  0 disk
> ├─sde1        8:65   0     6G  0 part [SWAP]
> ├─sde2        8:66   0   2,7T  0 part
> │ └─bcache1 254:128  0   2,7T  0 disk
> └─sde3        8:67   0 931,5G  0 part
>  └─bcache3 254:384  0 931,5G  0 disk /mnt/xfs-storage
> sdf           8:80   1   7,5G  0 disk
> └─sdf1        8:81   1   7,5G  0 part
> 
> # dmesg | grep bcache
> [    2.079349] bcache: bch_journal_replay() journal replay done, 503
> keys in 131 entries, seq 50942189
> [    2.079739] bcache: register_cache() registered cache device sda2
> [    2.079837] bcache: register_bdev() registered backing device sdb2
> [    2.286757] bcache: bch_cached_dev_attach() Caching sdb2 as bcache0
> on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
> [    2.286846] bcache: register_bdev() registered backing device sde2
> [    2.385586] BTRFS: device label system devid 5 transid 2310589
> /dev/bcache0 scanned by systemd-udevd (341)
> [    2.394597] bcache: bch_cached_dev_attach() Caching sde2 as bcache1
> on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
> [    2.394688] bcache: register_bdev() registered backing device sdd2
> [    2.507326] bcache: bch_cached_dev_attach() Caching sdd2 as bcache2
> on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
> [    2.507416] BTRFS: device label system devid 3 transid 2310589
> /dev/bcache1 scanned by systemd-udevd (341)
> [    2.507422] bcache: register_bdev() registered backing device sde3
> [    2.604546] bcache: bch_cached_dev_attach() Caching sde3 as bcache3
> on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
> [    2.604645] bcache: register_bdev() registered backing device sdc2
> [    2.708189] bcache: bch_cached_dev_attach() Caching sdc2 as bcache4
> on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
> [    2.731318] BTRFS: device label system devid 2 transid 2310589
> /dev/bcache2 scanned by systemd-udevd (339)
> [    2.748793] BTRFS: device label system devid 1 transid 2310589
> /dev/bcache4 scanned by systemd-udevd (336)
> [    3.201845] BTRFS info (device bcache4): use zstd compression, level 3
> [    3.201847] BTRFS info (device bcache4): enabling ssd optimizations
> [    3.201848] BTRFS info (device bcache4): using free space tree
> [    3.201848] BTRFS info (device bcache4): has skinny extents
> [    3.212303] BTRFS info (device bcache4): bdev /dev/bcache4 errs: wr
> 0, rd 0, flush 0, corrupt 1, gen 0
> # ignore the corrupt 1, this was a one-time occurrence while scrubbing
> # I can no longer reproduce this, maybe cosmic rays flipped a bit
> during scrubbing ;-)
> [    3.538177] BTRFS info (device bcache4): start tree-log replay
> [    6.187652] BTRFS info (device bcache4): using free space tree
> [    6.242252] XFS (bcache3): Mounting V5 Filesystem
> [    6.386719] XFS (bcache3): Starting recovery (logdev: internal)
> [    6.548018] XFS (bcache3): Ending recovery (logdev: internal)
> [    6.989625] bcache: register_bcache() error : device already registered
> [    6.991794] bcache: register_bcache() error : device already registered
> [    7.007219] bcache: register_bcache() error : device already registered
> [    7.300065] bcache: register_bcache() error : device already registered
> [    7.301004] bcache: register_bcache() error : device already registered
> [    7.370755] bcache: register_bcache() error : device already registered
> ```
> 
> Here's an excerpt of what I'm seeing:
> ```
> # ps faux|grep bcache
> root         180  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_ [bcache]
> root         381  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [bcache_gc]
> root         399  0.0  0.0      0     0 ?        S    Jan24   0:01  \_
> [bcache_allocato]
> root         400  0.0  0.0      0     0 ?        S    Jan24   0:10  \_
> [bcache_gc]
> root         402  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [bcache_writebac]
> root         403  0.2  0.0      0     0 ?        S    Jan24   1:23  \_
> [bcache_writebac]
> root         408  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
> [bcache_status_u]
> root         410  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [bcache_writebac]
> root         411  0.2  0.0      0     0 ?        S    Jan24   1:23  \_
> [bcache_writebac]
> root         417  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
> [bcache_status_u]
> root         419  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [bcache_writebac]
> root         420  0.1  0.0      0     0 ?        S    Jan24   0:51  \_
> [bcache_writebac]
> root         425  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
> [bcache_status_u]
> root         427  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [bcache_writebac]
> root         428  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
> [bcache_writebac]
> root         433  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
> [bcache_status_u]
> root         435  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [bcache_writebac]
> root         436  0.1  0.0      0     0 ?        S    Jan24   0:51  \_
> [bcache_writebac]
> root         441  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
> [bcache_status_u]
> root         570  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [xfs-buf/bcache3]
> root         571  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [xfs-conv/bcache]
> root         572  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [xfs-cil/bcache3]
> root         575  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
> [xfs-log/bcache3]
> root         576  0.0  0.0      0     0 ?        S    Jan24   0:03  \_
> [xfsaild/bcache3]
> root       72935  0.0  0.0      0     0 ?        I    01:08   0:00  \_
> [kworker/4:33-bcache_writeback_wq]
> root       72946  0.0  0.0      0     0 ?        I    01:08   0:00  \_
> [kworker/4:88-bcache]
> root       73171  0.0  0.0      0     0 ?        I    01:15   0:00  \_
> [kworker/4:122-bcache]
> root       74312  0.0  0.0      0     0 ?        I    01:57   0:00  \_
> [kworker/4:205-bcache_writeback_wq]
> root       74649  0.0  0.0      0     0 ?        I    02:09   0:00  \_
> [kworker/4:177-bcache]
> root       74658  0.0  0.0      0     0 ?        I    02:09   0:00  \_
> [kworker/4:238-bcache]
> root       76622  0.0  0.0      0     0 ?        I    03:08   0:00  \_
> [kworker/4:222-bcache]
> root       76658  0.0  0.0      0     0 ?        I    03:08   0:00  \_
> [kworker/4:318-bcache]
> ...
> root      111243  0.0  0.0      0     0 ?        I    03:49   0:00  \_
> [kworker/4:165-bcache_writeback_wq]
> root      111628  0.0  0.0      0     0 ?        I    04:05   0:00  \_
> [kworker/4:11-bcache_writeback_wq]
> root      111643  0.0  0.0      0     0 ?        I    04:05   0:00  \_
> [kworker/4:124-bcache_writeback_wq]
> root      111679  0.0  0.0      0     0 ?        I    04:05   0:00  \_
> [kworker/4:245-bcache]
> root      111697  0.0  0.0      0     0 ?        I    04:05   0:00  \_
> [kworker/4:295-bcache_writeback_wq]
> root      111720  0.0  0.0      0     0 ?        I    04:05   0:00  \_
> [kworker/4:337-bcache]
> root      111860  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:386-bcache]
> root      111862  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:390-bcache_writeback_wq]
> root      111866  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:394-bcache_writeback_wq]
> root      111867  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:395-bcache]
> root      111880  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:408-bcache_writeback_wq]
> root      111882  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:410-bcache]
> root      111896  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:424-bcache]
> root      111897  0.0  0.0      0     0 ?        I    04:09   0:00  \_
> [kworker/4:425-bcache]
> ...
> root      114184  0.0  0.0      0     0 ?        I    05:28   0:00  \_
> [kworker/4:181-bcache_writeback_wq]
> root      114186  0.0  0.0      0     0 ?        I    05:28   0:00  \_
> [kworker/4:187-bcache_writeback_wq]
> root      114187  0.0  0.0      0     0 ?        I    05:28   0:00  \_
> [kworker/4:188-bcache_writeback_wq]
> root      114193  0.0  0.0      0     0 ?        I    05:28   0:00  \_
> [kworker/4:207-bcache_writeback_wq]
> ...
> root      117073  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:182-bcache_writeback_wq]
> root      117074  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:190-bcache_writeback_wq]
> root      117077  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:218-bcache]
> root      117081  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:239-bcache_writeback_wq]
> root      117084  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:249-bcache]
> root      117087  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:264-bcache_writeback_wq]
> root      117088  0.0  0.0      0     0 ?        I    06:48   0:00  \_
> [kworker/4:269-bcache]
> root      118151  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:299-bcache]
> root      118152  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:300-bcache]
> root      118154  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:305-bcache]
> root      118159  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:313-bcache]
> root      118170  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:326-bcache_writeback_wq]
> root      118172  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:328-bcache]
> root      118179  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:335-bcache]
> root      118184  0.0  0.0      0     0 ?        I    06:53   0:00  \_
> [kworker/4:341-bcache_writeback_wq]
> ...
> ```
> 
> During high write work-loads, I'm sometimes seeing over 2000 kthreads
> (most of them are bcache workers).

They are not kernel thread exactly. Indeed they are kernel works
attaching on some kernel work queue. Bcache does I/Os in async way by
kwoerks (by explicit work queue schedule, or by implicit bcache closure
schedule), the "kthread" you observed is a context of an I/O kworker.

That is, the more parallel I/Os flying, the more such contexts you may
observe. This is as-designed for current bcache.

Thanks.

Coly Li




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux