Re: Ceph-fuse huge performance gap between different block sizes

Jan Schermer <jan@xxxxxxxxxxx> · Fri, 25 Mar 2016 12:32:22 +0100

FYI when I performed testing on our cluster I saw the same thing.
fio randwrite 4k test over a large volume was a lot faster with larger RBD object size (8mb was marginally better than the default 4mb). It makes no sense to me unless there is a huge overhead with increasing number of objects. Or maybe there is some sort of alignment problem that causes small objects overlap with the actual workload. (In my cluster some objects are mysteriously sized as 4MiB-4KiB).

Jan

On 25. 3. 2016, at 10:17, Zhang Qiang <dotslash.lu@xxxxxxxxx> wrote:

Hi Christian, Thanks for your reply, here're the test specs:>>>
[global]
ioengine=libaio
runtime=90
direct=1
group_reporting
iodepth=16
ramp_time=5
size=1G

[seq_w_4k_20]
bs=4k
filename=seq_w_4k_20
rw=write
numjobs=20

[seq_w_1m_20]
bs=1m
filename=seq_w_1m_20
rw=write
numjobs=20
<<<<

Test results: 4k -  aggrb=13245KB/s, 1m - aggrb=1102.6MB/s

Mount options:  ceph-fuse /ceph -m 10.3.138.36:6789

Ceph configurations:
>>>>
filestore_xattr_use_omap = true
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 128
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 512
osd pool default pgp num = 512
osd crush chooseleaf type = 1
<<<<

Other configurations are all default.

Status:
     health HEALTH_OK
     monmap e5: 5 mons at {1=10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0}
            election epoch 28, quorum 0,1,2,3,4 GGZ-YG-S0311-PLATFORM-138,1,2,3,4
     mdsmap e55: 1/1/1 up {0=1=up:active}
     osdmap e1290: 20 osds: 20 up, 20 in
      pgmap v7180: 1000 pgs, 2 pools, 14925 MB data, 3851 objects
            37827 MB used, 20837 GB / 21991 GB avail
                1000 active+clean

On Fri, 25 Mar 2016 at 16:44 Christian Balzer <chibi@xxxxxxx> wrote:

Hello,

On Fri, 25 Mar 2016 08:11:27 +0000 Zhang Qiang wrote:

> Hi all,

>

> According to fio,

Exact fio command please.

>with 4k block size, the sequence write performance of

> my ceph-fuse mount

Exact mount options, ceph config (RBD cache) please.

>is just about 20+ M/s, only 200 Mb of 1 Gb full

> duplex NIC outgoing bandwidth was used for maximum. But for 1M block

> size the performance could achieve as high as 1000 M/s, approaching the

> limit of the NIC bandwidth. Why the performance stats differs so mush

> for different block sizes?

That's exactly why.

You can see that with local attached storage as well, many small requests

are slower than large (essential sequential) writes.

Network attached storage in general (latency) and thus Ceph as well (plus

code overhead) amplify that.

>Can I configure ceph-fuse mount's block size

> for maximum performance?

>

Very little to do with that if you're using sync writes (thus the fio

command line pleasE), if not RBD cache could/should help.

Christian

> Basic information about the cluster: 20 OSDs on separate PCIe hard disks

> distributed across 2 servers, each with write performance about 300 M/s;

> 5 MONs; 1 MDS. Ceph version 0.94.6

> (e832001feaf8c176593e0325c8298e3f16dfb403).

>

> Thanks :)

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications

http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com