bcache create hundred of kernel threads - is this expected?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

When looking at htop, I see hundreds of kernel threads dedicated to
bcache workers. Many of them go in and out of D state at the same time
which may explain the spikes in load I'm seeing especially with
writeback enabled (easily surpassing 200 loadavg).

>From some older observations, the system feels smoother when turning
bcache off (cache_mode = none) but obviously many file system
operations are much slower. With bcache enabled, I'm seeing
spontaneous desktop lags every once in a while.

My bcache setup runs mainly on btrfs, with one additional xfs
partition where I've put some background database workload which isn't
exactly friends with btrfs.

I'm wondering if this is expected. No other block-related driver seems
to be such a heavy user of kernel threads:
```
# uname -a
Linux jupiter 5.10.9-gentoo #3 SMP PREEMPT Sun Jan 24 22:43:57 CET
2021 x86_64 Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz GenuineIntel
GNU/Linux

# ps faux | grep bcache | wc -l
284

# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 931,5G  0 disk
├─sda1        8:1    0     2G  0 part /efi
├─sda2        8:2    0 865,5G  0 part
│ ├─bcache0 254:0    0   2,7T  0 disk # pooled with bcache4
│ ├─bcache1 254:128  0   2,7T  0 disk # pooled with bcache4
│ ├─bcache2 254:256  0 925,5G  0 disk # pooled with bcache4
│ ├─bcache3 254:384  0 931,5G  0 disk /mnt/xfs-storage
│ └─bcache4 254:512  0 925,5G  0 disk /mnt/btrfs-pool
└─sda3        8:3    0    64G  0 part [SWAP]
sdb           8:16   0   2,7T  0 disk
├─sdb1        8:17   0    32G  0 part [SWAP]
└─sdb2        8:18   0   2,7T  0 part
 └─bcache0 254:0    0   2,7T  0 disk
sdc           8:32   0 931,5G  0 disk
├─sdc1        8:33   0     6G  0 part [SWAP]
└─sdc2        8:34   0 925,5G  0 part
 └─bcache4 254:512  0 925,5G  0 disk /mnt/btrfs-pool
sdd           8:48   0 931,5G  0 disk
├─sdd1        8:49   0     6G  0 part [SWAP]
└─sdd2        8:50   0 925,5G  0 part
 └─bcache2 254:256  0 925,5G  0 disk
sde           8:64   0   3,7T  0 disk
├─sde1        8:65   0     6G  0 part [SWAP]
├─sde2        8:66   0   2,7T  0 part
│ └─bcache1 254:128  0   2,7T  0 disk
└─sde3        8:67   0 931,5G  0 part
 └─bcache3 254:384  0 931,5G  0 disk /mnt/xfs-storage
sdf           8:80   1   7,5G  0 disk
└─sdf1        8:81   1   7,5G  0 part

# dmesg | grep bcache
[    2.079349] bcache: bch_journal_replay() journal replay done, 503
keys in 131 entries, seq 50942189
[    2.079739] bcache: register_cache() registered cache device sda2
[    2.079837] bcache: register_bdev() registered backing device sdb2
[    2.286757] bcache: bch_cached_dev_attach() Caching sdb2 as bcache0
on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
[    2.286846] bcache: register_bdev() registered backing device sde2
[    2.385586] BTRFS: device label system devid 5 transid 2310589
/dev/bcache0 scanned by systemd-udevd (341)
[    2.394597] bcache: bch_cached_dev_attach() Caching sde2 as bcache1
on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
[    2.394688] bcache: register_bdev() registered backing device sdd2
[    2.507326] bcache: bch_cached_dev_attach() Caching sdd2 as bcache2
on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
[    2.507416] BTRFS: device label system devid 3 transid 2310589
/dev/bcache1 scanned by systemd-udevd (341)
[    2.507422] bcache: register_bdev() registered backing device sde3
[    2.604546] bcache: bch_cached_dev_attach() Caching sde3 as bcache3
on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
[    2.604645] bcache: register_bdev() registered backing device sdc2
[    2.708189] bcache: bch_cached_dev_attach() Caching sdc2 as bcache4
on set df11a09d-b91a-4cd3-80e2-2684f5b9c1b2
[    2.731318] BTRFS: device label system devid 2 transid 2310589
/dev/bcache2 scanned by systemd-udevd (339)
[    2.748793] BTRFS: device label system devid 1 transid 2310589
/dev/bcache4 scanned by systemd-udevd (336)
[    3.201845] BTRFS info (device bcache4): use zstd compression, level 3
[    3.201847] BTRFS info (device bcache4): enabling ssd optimizations
[    3.201848] BTRFS info (device bcache4): using free space tree
[    3.201848] BTRFS info (device bcache4): has skinny extents
[    3.212303] BTRFS info (device bcache4): bdev /dev/bcache4 errs: wr
0, rd 0, flush 0, corrupt 1, gen 0
# ignore the corrupt 1, this was a one-time occurrence while scrubbing
# I can no longer reproduce this, maybe cosmic rays flipped a bit
during scrubbing ;-)
[    3.538177] BTRFS info (device bcache4): start tree-log replay
[    6.187652] BTRFS info (device bcache4): using free space tree
[    6.242252] XFS (bcache3): Mounting V5 Filesystem
[    6.386719] XFS (bcache3): Starting recovery (logdev: internal)
[    6.548018] XFS (bcache3): Ending recovery (logdev: internal)
[    6.989625] bcache: register_bcache() error : device already registered
[    6.991794] bcache: register_bcache() error : device already registered
[    7.007219] bcache: register_bcache() error : device already registered
[    7.300065] bcache: register_bcache() error : device already registered
[    7.301004] bcache: register_bcache() error : device already registered
[    7.370755] bcache: register_bcache() error : device already registered
```

Here's an excerpt of what I'm seeing:
```
# ps faux|grep bcache
root         180  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_ [bcache]
root         381  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[bcache_gc]
root         399  0.0  0.0      0     0 ?        S    Jan24   0:01  \_
[bcache_allocato]
root         400  0.0  0.0      0     0 ?        S    Jan24   0:10  \_
[bcache_gc]
root         402  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[bcache_writebac]
root         403  0.2  0.0      0     0 ?        S    Jan24   1:23  \_
[bcache_writebac]
root         408  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
[bcache_status_u]
root         410  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[bcache_writebac]
root         411  0.2  0.0      0     0 ?        S    Jan24   1:23  \_
[bcache_writebac]
root         417  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
[bcache_status_u]
root         419  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[bcache_writebac]
root         420  0.1  0.0      0     0 ?        S    Jan24   0:51  \_
[bcache_writebac]
root         425  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
[bcache_status_u]
root         427  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[bcache_writebac]
root         428  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
[bcache_writebac]
root         433  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
[bcache_status_u]
root         435  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[bcache_writebac]
root         436  0.1  0.0      0     0 ?        S    Jan24   0:51  \_
[bcache_writebac]
root         441  0.0  0.0      0     0 ?        S    Jan24   0:00  \_
[bcache_status_u]
root         570  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[xfs-buf/bcache3]
root         571  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[xfs-conv/bcache]
root         572  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[xfs-cil/bcache3]
root         575  0.0  0.0      0     0 ?        I<   Jan24   0:00  \_
[xfs-log/bcache3]
root         576  0.0  0.0      0     0 ?        S    Jan24   0:03  \_
[xfsaild/bcache3]
root       72935  0.0  0.0      0     0 ?        I    01:08   0:00  \_
[kworker/4:33-bcache_writeback_wq]
root       72946  0.0  0.0      0     0 ?        I    01:08   0:00  \_
[kworker/4:88-bcache]
root       73171  0.0  0.0      0     0 ?        I    01:15   0:00  \_
[kworker/4:122-bcache]
root       74312  0.0  0.0      0     0 ?        I    01:57   0:00  \_
[kworker/4:205-bcache_writeback_wq]
root       74649  0.0  0.0      0     0 ?        I    02:09   0:00  \_
[kworker/4:177-bcache]
root       74658  0.0  0.0      0     0 ?        I    02:09   0:00  \_
[kworker/4:238-bcache]
root       76622  0.0  0.0      0     0 ?        I    03:08   0:00  \_
[kworker/4:222-bcache]
root       76658  0.0  0.0      0     0 ?        I    03:08   0:00  \_
[kworker/4:318-bcache]
...
root      111243  0.0  0.0      0     0 ?        I    03:49   0:00  \_
[kworker/4:165-bcache_writeback_wq]
root      111628  0.0  0.0      0     0 ?        I    04:05   0:00  \_
[kworker/4:11-bcache_writeback_wq]
root      111643  0.0  0.0      0     0 ?        I    04:05   0:00  \_
[kworker/4:124-bcache_writeback_wq]
root      111679  0.0  0.0      0     0 ?        I    04:05   0:00  \_
[kworker/4:245-bcache]
root      111697  0.0  0.0      0     0 ?        I    04:05   0:00  \_
[kworker/4:295-bcache_writeback_wq]
root      111720  0.0  0.0      0     0 ?        I    04:05   0:00  \_
[kworker/4:337-bcache]
root      111860  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:386-bcache]
root      111862  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:390-bcache_writeback_wq]
root      111866  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:394-bcache_writeback_wq]
root      111867  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:395-bcache]
root      111880  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:408-bcache_writeback_wq]
root      111882  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:410-bcache]
root      111896  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:424-bcache]
root      111897  0.0  0.0      0     0 ?        I    04:09   0:00  \_
[kworker/4:425-bcache]
...
root      114184  0.0  0.0      0     0 ?        I    05:28   0:00  \_
[kworker/4:181-bcache_writeback_wq]
root      114186  0.0  0.0      0     0 ?        I    05:28   0:00  \_
[kworker/4:187-bcache_writeback_wq]
root      114187  0.0  0.0      0     0 ?        I    05:28   0:00  \_
[kworker/4:188-bcache_writeback_wq]
root      114193  0.0  0.0      0     0 ?        I    05:28   0:00  \_
[kworker/4:207-bcache_writeback_wq]
...
root      117073  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:182-bcache_writeback_wq]
root      117074  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:190-bcache_writeback_wq]
root      117077  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:218-bcache]
root      117081  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:239-bcache_writeback_wq]
root      117084  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:249-bcache]
root      117087  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:264-bcache_writeback_wq]
root      117088  0.0  0.0      0     0 ?        I    06:48   0:00  \_
[kworker/4:269-bcache]
root      118151  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:299-bcache]
root      118152  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:300-bcache]
root      118154  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:305-bcache]
root      118159  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:313-bcache]
root      118170  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:326-bcache_writeback_wq]
root      118172  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:328-bcache]
root      118179  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:335-bcache]
root      118184  0.0  0.0      0     0 ?        I    06:53   0:00  \_
[kworker/4:341-bcache_writeback_wq]
...
```

During high write work-loads, I'm sometimes seeing over 2000 kthreads
(most of them are bcache workers).

TIA
Kai




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux