Re: Bcache in writes direct with fsync. Are IOPS limited?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Wed, May 11, 2022 at 12:58:48PM +0000, Adriano Silva wrote:
> Tank you for your answer!
> > bcache needs to do a lot of metadata work, resulting in a noticeable
> > write amplification. My testing with bcache (some years ago and only with
> > SATA SSDs) showed that bcache latency increases a lot with high amounts
> > of dirty data
> I'm testing with empty devices, no data.
> Wouldn't write amplification be noticeable in dstat? Because it doesn't seem significant during the tests, since I monitor reads and writes in all disks in dstat.

yes, you are right, that would be visible. I was misled from ~3k writes
to nvme (vs. ~1.5k writes from fio), but the same ~3k writes are on

> > I also found performance to increase slightly when a bcache device
> > was created with 4k block size instead of default 512bytes.
> Are you talking about changing the block size for the cache device or the backing device?

neither - it was the "-w" argument to make-bcache. I found some old
logfile from my tests. Where both hdd and ssd showed as
512b-sector-devices, the command to create the bcache device was 
    make-bcache --data_offset 2048 --wipe-bcache -w 4k -C /dev/sde1 -B /dev/sdb
In /sys/block/bcacheX/queue/hw_sector_size it then says "4096".

> But when I remove the fsync flag in the test with fio, which tells the application to wait for the write response, the 4K write happens much faster, reaching 73.6 MB/s and 17k IOPS. This is half the device's performance, but it's more than enough for my case. The fsync flag makes no significant difference to the performance of my flash disk when testing directly on it. The fact that bcache speeds up when the fsync flag is removed makes me believe that bcache is not slow to write, but for some reason, bcache is taking a while to respond that the write is complete. I think that should be the point!

I can't claim to fully understand what fsync does (or how a block
device driver is supposed to handle it), but this might account for the
roughly doubled writes shown with dstat as opposed to the fio results.

>From the name "journal-test" I guess you are trying something like
He uses very similar parameters, except with "--sync=1", not

This is a proper benchmark for the old ceph filestore journal, as this
was written linearly, and in the worst case could have been written in
chunks as small as 4k.

As you are using proxmox, I guess you want to use its ceph component.
They use the modern ceph bluestore format, and there is no journal
anymore.  I don't know if the bluestore WAL exhibits similar access
patterns as the old journal and if this benchmark still has real-world
relevance.  But when having enough NVMe disk space, you are advised to
put bluestore WAL and ideally also the bluestore DB directly on NVMe,
and use bcache only for the bluestore data part. If you do so, make sure
to set rotational=1 on the bcache device before creating the OSD, or
ceph will use unsuitable bluestore parameters, possibly overwhelming the


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux