> 2022年6月30日 15:39,Nikhil Kshirsagar <nkshirsagar@xxxxxxxxx> 写道: > > Yes, I understand, but if you see the graphs, for a 12gb random write > IO (with a 15gb SSD, so enough cache in theory for entire write), > bcache gets speeds very close to SLOW DISK! (3MB/s consistently, while > dmcache gets 400mb/s consistently except the first run where it "warms > up"), so that is why I wanted to understand whether there's any > tunable to get the "close to ssd" or even 300-400MB/s speed (ssd speed > is 500mb/s avg) > > Regards, > Nikhil. Every time when around 900MB data written into cache device, gc thread will be triggered awake to work. And when the dirty data exceeds around 1.5G, writeback thread will start and throttle front end write speed. With more dirty data, more throttle for front end I/Os. And when dirty data exceeds 70% cache size, which is around 10.5G in your case, all following I/Os will go directly into backing device and not synced. I guess this is why you may observe slow I/O speed for your testing, which looks like as expected IMHO. Coly Li > > > On Thu, 30 Jun 2022 at 13:06, Coly Li <colyli@xxxxxxx> wrote: >> >> >> >>> 2022年6月30日 15:26,Nikhil Kshirsagar <nkshirsagar@xxxxxxxxx> 写道: >>> >>> Thank you for the clarification. But my testing results show that even with 15GB cache device, if I write 12gb, it still slow down, so you do not get "close to ssd" speed for such IO write.. even if its smaller than cache size. >>> >>> Attached results of testing comparing dm-cache with bcache. command used was "fio --rw=randwrite --size=12G --ioengine=libaio --direct=1 --gtod_reduce=1 --iodepth=128 --bs=4k" >>> >>> >> >> I cannot tell why dmcache is so good from your performance number. But if the peak write speed is around 550MB/s, it may take around 20 seconds. What happens if the I/O testing may take longer, e.g. 1 hours? >> >> BTW, people cannot get “close to ssd” speed on bcache, for each write/read I/O request, bcache will update B+tree index, cache data, write journal, and maybe split B+tree node, and the I/O procedure might be interfered by I/Os from gc and writeback. So it is good enough, but cannot be close to SSD speed. >> >> Coly Li >> >>> >>> >>> -Nikhil. >>> >>> On Thu, 30 Jun 2022 at 12:19, Coly Li <colyli@xxxxxxx> wrote: >>>> >>>> >>>> >>>>> 2022年6月30日 13:07,Nikhil Kshirsagar <nkshirsagar@xxxxxxxxx> 写道: >>>>> >>>>> HI Coly, >>>>> >>>>> even after turning it on by echo 1 into >>>>> /sys/fs/bcache/<UUID>/internal/gc_after_writeback >>>> >>>> gc_after_writeback is a switch to triger a gc operation when writeback finished to flush all dirty data to backing device. Which might be good for future writing I/Os. >>>> It doesn’t help to gc performance. >>>> >>>> >>>> >>>>> >>>>> I still see [bcache_gc] threads appear about 70% into writing the 8 gb >>>>> IO into 10 gb cache.. so with the result that 8gb write takes very >>>>> long, in spite of having more than enough ssd cache for it.. >>>>> >>>> >>>> This is as designed. Gc thread is triggered when every 1/16 cache space is used, if there is no gc, the whole bcache process is very probably to be locked up, due to no space for meta-data or cached data. >>>> >>>> This is why I suggest a larger cache device. And gc is unavoidable, when cache device is small, all allocation will wait for gc to make more free room. And in order to make more available free space, the dirty sectors should be written back to backing device, which is why you see everything is slow down. >>>> >>>> >>>> Coly Li >>>> >>>> >>>> >>>>> Regards, >>>>> Nikhil. >>>>> >>>>> On Thu, 30 Jun 2022 at 09:54, Nikhil Kshirsagar <nkshirsagar@xxxxxxxxx> wrote: >>>>>> >>>>>> Thanks Coly! >>>>>> >>>>>> Can garbage collection be turned off, by echo 1 into >>>>>> /sys/fs/bcache/<UUID>/internal/gc_after_writeback ? >>>>>> >>>>>> The issue I'm seeing is, garbage collection causes write performance >>>>>> (writeback mode) to drop whenever the cache gets 50% full. >>>>>> >>>>>> With a 10gb cache device, an 8 GB write (using fio randwrite) should >>>>>> give SSD like speed, but it does not. I am wondering if its due to the >>>>>> gc threads. >>>>>> >>>>>> Regards, >>>>>> Nikhil. >>>>>> >>>>>> On Sat, 25 Jun 2022 at 17:38, Coly Li <colyli@xxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> 2022年6月25日 14:29,Nikhil Kshirsagar <nkshirsagar@xxxxxxxxx> 写道: >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I've been doing some performance tests of bcache on 5.15.0-40-generic. >>>>>>>> >>>>>>>> The baseline figures for the fast and slow disk for random writes are >>>>>>>> consistent at around 225MiB/s and 3046KiB/s. >>>>>>>> >>>>>>>> But the bcache results inexplicably drop sometimes to 10Mib/s, for >>>>>>>> random write test using fio like this - >>>>>>>> >>>>>>>> fio --rw=randwrite --size=1G --ioengine=libaio --direct=1 >>>>>>>> --gtod_reduce=1 --iodepth=128 --bs=4k --name=MY_TEST1 >>>>>>>> >>>>>>>> WRITE: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s), >>>>>>>> io=1024MiB (1074MB), run=6104-6104msec >>>>>>>> WRITE: bw=283MiB/s (297MB/s), 283MiB/s-283MiB/s (297MB/s-297MB/s), >>>>>>>> io=1024MiB (1074MB), run=3621-3621msec >>>>>>>> WRITE: bw=10.3MiB/s (10.9MB/s), 10.3MiB/s-10.3MiB/s >>>>>>>> (10.9MB/s-10.9MB/s), io=1024MiB (1074MB), run=98945-98945msec >>>>>>>> WRITE: bw=8236KiB/s (8434kB/s), 8236KiB/s-8236KiB/s >>>>>>>> (8434kB/s-8434kB/s), io=1024MiB (1074MB), run=127317-127317msec >>>>>>>> WRITE: bw=9657KiB/s (9888kB/s), 9657KiB/s-9657KiB/s >>>>>>>> (9888kB/s-9888kB/s), io=1024MiB (1074MB), run=108587-108587msec >>>>>>>> WRITE: bw=4543KiB/s (4652kB/s), 4543KiB/s-4543KiB/s >>>>>>>> (4652kB/s-4652kB/s), io=1024MiB (1074MB), run=230819-230819msec >>>>>>>> >>>>>>>> This seems to happen after 2 runs of 1gb writes (cache disk is 4gb size) >>>>>>>> >>>>>>>> Some details are here - https://pastebin.com/V9mpLCbY , I will share >>>>>>>> the full testing results soon, but just was wondering about this >>>>>>>> performance drop for no apparent reason once the cache gets about 50% >>>>>>>> full. >>>>>>> >>>>>>> >>>>>>> It seems you are stuck by garbage collection. 4GB cache is small, the garbage collection might be invoked quite frequently. Maybe you can see the output of ’top -H’ to check whether there is kernel thread named bache_gc. >>>>>>> >>>>>>> Anyway, 4GB cache is too small. >>>>>>> >>>>>>> Coly Li >>>>>>> >>>> >>