FYI when I performed testing on our cluster I saw the same thing. fio randwrite 4k test over a large volume was a lot faster with larger RBD object size (8mb was marginally better than the default 4mb). It makes no sense to me unless there is a huge overhead with increasing number of objects. Or maybe there is some sort of alignment problem that causes small objects overlap with the actual workload. (In my cluster some objects are mysteriously sized as 4MiB-4KiB).
Jan Hi Christian, Thanks for your reply, here're the test specs: >>> [global] ioengine=libaio runtime=90 direct=1 group_reporting iodepth=16 ramp_time=5 size=1G
[seq_w_4k_20] bs=4k filename=seq_w_4k_20 rw=write numjobs=20
[seq_w_1m_20] bs=1m filename=seq_w_1m_20 rw=write numjobs=20 <<<<
Test results: 4k - aggrb=13245KB/s, 1m - aggrb=1102.6MB/s
Ceph configurations: >>>> filestore_xattr_use_omap = true auth cluster required = cephx auth service required = cephx auth client required = cephx osd journal size = 128 osd pool default size = 2 osd pool default min size = 1 osd pool default pg num = 512 osd pool default pgp num = 512 osd crush chooseleaf type = 1 <<<<
Other configurations are all default.
Status: health HEALTH_OK election epoch 28, quorum 0,1,2,3,4 GGZ-YG-S0311-PLATFORM-138,1,2,3,4 mdsmap e55: 1/1/1 up {0=1=up:active} osdmap e1290: 20 osds: 20 up, 20 in pgmap v7180: 1000 pgs, 2 pools, 14925 MB data, 3851 objects 37827 MB used, 20837 GB / 21991 GB avail 1000 active+clean On Fri, 25 Mar 2016 at 16:44 Christian Balzer < chibi@xxxxxxx> wrote:
Hello,
On Fri, 25 Mar 2016 08:11:27 +0000 Zhang Qiang wrote:
> Hi all,
>
> According to fio,
Exact fio command please.
>with 4k block size, the sequence write performance of
> my ceph-fuse mount
Exact mount options, ceph config (RBD cache) please.
>is just about 20+ M/s, only 200 Mb of 1 Gb full
> duplex NIC outgoing bandwidth was used for maximum. But for 1M block
> size the performance could achieve as high as 1000 M/s, approaching the
> limit of the NIC bandwidth. Why the performance stats differs so mush
> for different block sizes?
That's exactly why.
You can see that with local attached storage as well, many small requests
are slower than large (essential sequential) writes.
Network attached storage in general (latency) and thus Ceph as well (plus
code overhead) amplify that.
>Can I configure ceph-fuse mount's block size
> for maximum performance?
>
Very little to do with that if you're using sync writes (thus the fio
command line pleasE), if not RBD cache could/should help.
Christian
> Basic information about the cluster: 20 OSDs on separate PCIe hard disks
> distributed across 2 servers, each with write performance about 300 M/s;
> 5 MONs; 1 MDS. Ceph version 0.94.6
> (e832001feaf8c176593e0325c8298e3f16dfb403).
>
> Thanks :)
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Global OnLine Japan/Rakuten Communications
http://www.gol.com/
|