On Tue, May 10, 2022 at 04:49:35PM +0000, Adriano Silva wrote: > As we can see, the same test done on the bcache0 device only got 1548 IOPS and that yielded only 6.3 KB/s. > > This is much more than any spinning HDD could give me, but many times less than the result obtained by NVMe. Hi, bcache needs to do a lot of metadata work, resulting in a noticeable write amplification. My testing with bcache (some years ago and only with SATA SSDs) showed that bcache latency increases a lot with high amounts of dirty data, so I used to tune down writeback_percent, usually to 1, and used to keep the cache device size low at around 40GB. I also found performance to increase slightly when a bcache device was created with 4k block size instead of default 512bytes. Still quite a decrease in iops. Maybe you could monitor with iostat, it gives those _await columns, there might be some hints. Matthias > I've noticed in several tests, varying the amount of jobs or increasing the size of the blocks, that the larger the size of the blocks, the more I approximate the performance of the physical device to the bcache device. But it always seems that the amount of IOPS is limited to somewhere around 1500-1800 IOPS (maximum). By increasing the amount of jobs, I get better results and more IOPS, but if you divide the total IOPS by the amount of jobs, you can see that the IOPS are always limited in the range 1500-1800 per job. > > The commands used to configure bcache were: > > # echo writeback > /sys/block/bcache0/bcache/cache_mode > # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff > ## > ## Then I tried everything also with the commands below, but there was no improvement. > ## > # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us > # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us > > > Monitoring with dstat, it is possible to notice that when activating the fio command, the writing is all done in the cache device (a second partition of NVMe), until the end of the test. The spinning disk is only written after the time has passed and it is possible to see the read on the NVMe and the write on the spinning disk (which means the transfer of data in the background). > > --dsk/sdb---dsk/nvme0n1-dsk/bcache0 ---io/sdb----io/nvme0n1--io/bcache0 -net/total- ---load-avg--- --total-cpu-usage-- ---system-- ----system---- async > read writ: read writ: read writ| read writ: read writ: read writ| recv send| 1m 5m 15m |usr sys idl wai stl| int csw | time | #aio > 0 0 : 0 0 : 0 0 | 0 0 : 0 0 : 0 0 |8462B 8000B|0.03 0.15 0.31| 1 0 99 0 0| 250 383 |09-05 15:19:47| 0 > 0 0 :4096B 454k: 0 336k| 0 0 :1.00 184 : 0 170 |4566B 4852B|0.03 0.15 0.31| 2 2 94 1 0|1277 3470 |09-05 15:19:48| 1B > 0 8192B: 0 8022k: 0 6512k| 0 2.00 : 0 3388 : 0 3254 |3261B 2827B|0.11 0.16 0.32| 0 2 93 5 0|4397 16k|09-05 15:19:49| 1B > 0 0 : 0 7310k: 0 6460k| 0 0 : 0 3240 : 0 3231 |6773B 6428B|0.11 0.16 0.32| 0 1 93 6 0|4190 16k|09-05 15:19:50| 1B > 0 0 : 0 7313k: 0 6504k| 0 0 : 0 3252 : 0 3251 |6719B 6201B|0.11 0.16 0.32| 0 2 92 6 0|4482 16k|09-05 15:19:51| 1B > 0 0 : 0 7313k: 0 6496k| 0 0 : 0 3251 : 0 3250 |4743B 4016B|0.11 0.16 0.32| 0 1 93 6 0|4243 16k|09-05 15:19:52| 1B > 0 0 : 0 7329k: 0 6496k| 0 0 : 0 3289 : 0 3245 |6107B 6062B|0.11 0.16 0.32| 1 1 90 8 0|4706 18k|09-05 15:19:53| 1B > 0 0 : 0 5373k: 0 4184k| 0 0 : 0 2946 : 0 2095 |6387B 6062B|0.26 0.19 0.33| 0 2 95 4 0|3774 12k|09-05 15:19:54| 1B > 0 0 : 0 6966k: 0 5668k| 0 0 : 0 3270 : 0 2834 |7264B 7546B|0.26 0.19 0.33| 0 1 93 5 0|4214 15k|09-05 15:19:55| 1B > 0 0 : 0 7271k: 0 6252k| 0 0 : 0 3258 : 0 3126 |5928B 4584B|0.26 0.19 0.33| 0 2 93 5 0|4156 16k|09-05 15:19:56| 1B > 0 0 : 0 7419k: 0 6504k| 0 0 : 0 3308 : 0 3251 |5226B 5650B|0.26 0.19 0.33| 2 1 91 6 0|4433 16k|09-05 15:19:57| 1B > 0 0 : 0 6444k: 0 5704k| 0 0 : 0 2873 : 0 2851 |6494B 8021B|0.26 0.19 0.33| 1 1 91 7 0|4352 16k|09-05 15:19:58| 0 > 0 0 : 0 0 : 0 0 | 0 0 : 0 0 : 0 0 |6030B 7204B|0.24 0.19 0.32| 0 0 100 0 0| 209 279 |09-05 15:19:59| 0