On Thu, Oct 15, 2015 at 5:11 PM, Butkeev Stas <staerist@xxxxx> wrote: > Hello all, > Does anybody try to use cephfs? > > I have two servers with RHEL7.1(latest kernel 3.10.0-229.14.1.el7.x86_64). Each server has 15G flash for ceph journal and 12*2Tb SATA disk for data. > I have Infiniband(ipoib) 56Gb/s interconnect between nodes. > > > Cluster version > # ceph -v > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > > Cluster config > # cat /etc/ceph/ceph.conf > [global] > auth service required = cephx > auth client required = cephx > auth cluster required = cephx > fsid = 0f05deaf-ee6f-4342-b589-5ecf5527aa6f > mon osd full ratio = .95 > mon osd nearfull ratio = .90 > osd pool default size = 2 > osd pool default min size = 1 > osd pool default pg num = 32 > osd pool default pgp num = 32 > max open files = 131072 > osd crush chooseleaf type = 1 > [mds] > > [mds.a] > host = ak34 > > [mon] > mon_initial_members = a,b > > [mon.a] > host = ak34 > mon addr = 172.24.32.134:6789 > > [mon.b] > host = ak35 > mon addr = 172.24.32.135:6789 > > [osd] > osd journal size = 1000 > > [osd.0] > osd uuid = b3b3cd37-8df5-4455-8104-006ddba2c443 > host = ak34 > public addr = 172.24.32.134 > osd journal = /CEPH_JOURNAL/osd/ceph-0/journal > ..... > > > Below tree of cluster > # ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 45.75037 root default > -2 45.75037 region RU > -3 45.75037 datacenter ru-msk-ak48t > -4 22.87518 host ak34 > 0 1.90627 osd.0 up 1.00000 1.00000 > 1 1.90627 osd.1 up 1.00000 1.00000 > 2 1.90627 osd.2 up 1.00000 1.00000 > 3 1.90627 osd.3 up 1.00000 1.00000 > 4 1.90627 osd.4 up 1.00000 1.00000 > 5 1.90627 osd.5 up 1.00000 1.00000 > 6 1.90627 osd.6 up 1.00000 1.00000 > 7 1.90627 osd.7 up 1.00000 1.00000 > 8 1.90627 osd.8 up 1.00000 1.00000 > 9 1.90627 osd.9 up 1.00000 1.00000 > 10 1.90627 osd.10 up 1.00000 1.00000 > 11 1.90627 osd.11 up 1.00000 1.00000 > -5 22.87518 host ak35 > 12 1.90627 osd.12 up 1.00000 1.00000 > 13 1.90627 osd.13 up 1.00000 1.00000 > 14 1.90627 osd.14 up 1.00000 1.00000 > 15 1.90627 osd.15 up 1.00000 1.00000 > 16 1.90627 osd.16 up 1.00000 1.00000 > 17 1.90627 osd.17 up 1.00000 1.00000 > 18 1.90627 osd.18 up 1.00000 1.00000 > 19 1.90627 osd.19 up 1.00000 1.00000 > 20 1.90627 osd.20 up 1.00000 1.00000 > 21 1.90627 osd.21 up 1.00000 1.00000 > 22 1.90627 osd.22 up 1.00000 1.00000 > 23 1.90627 osd.23 up 1.00000 1.00000 > > Status of cluster > # ceph -s > cluster 0f05deaf-ee6f-4342-b589-5ecf5527aa6f > health HEALTH_OK > monmap e1: 2 mons at {a=172.24.32.134:6789/0,b=172.24.32.135:6789/0} > election epoch 10, quorum 0,1 a,b > mdsmap e14: 1/1/1 up {0=a=up:active} > osdmap e194: 24 osds: 24 up, 24 in > pgmap v2305: 384 pgs, 3 pools, 271 GB data, 72288 objects > 545 GB used, 44132 GB / 44678 GB avail > 384 active+clean > > > Pools for cephfs > ]# ceph osd dump|grep pg > pool 1 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 154 flags hashpspool crash_replay_interval 45 stripe_width 0 > pool 2 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 144 flags hashpspool stripe_width 0 > > Rados bench > # rados bench -p cephfs_data 300 write --no-cleanup && rados bench -p cephfs_data 300 seq > Maintaining 16 concurrent writes of 4194304 bytes for up to 300 seconds or 0 objects > Object prefix: benchmark_data_XXXXXXXXXXXXXXXXXXXX_8108 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 170 154 615.74 616 0.109984 0.0978277 > 2 16 335 319 637.817 660 0.0623079 0.0985001 > 3 16 496 480 639.852 644 0.0992808 0.0982317 > 4 16 662 646 645.862 664 0.0683485 0.0980203 > 5 16 831 815 651.796 676 0.0773545 0.0973635 > 6 15 994 979 652.479 656 0.112323 0.096901 > 7 16 1164 1148 655.826 676 0.107592 0.0969845 > 8 16 1327 1311 655.335 652 0.0960067 0.0968445 > 9 16 1488 1472 654.066 644 0.0780589 0.0970879 > > ..... > 297 16 43445 43429 584.811 596 0.0569516 0.109399 > 298 16 43601 43585 584.942 624 0.0707439 0.109388 > 299 16 43756 43740 585.059 620 0.20408 0.109363 > 2015-10-15 14:16:59.622610min lat: 0.0109677 max lat: 0.951389 avg lat: 0.109344 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 300 13 43901 43888 585.082 592 0.0768806 0.109344 > Total time run: 300.329089 > Total reads made: 43901 > Read size: 4194304 > Bandwidth (MB/sec): 584.705 > > Average Latency: 0.109407 > Max latency: 0.951389 > Min latency: 0.0109677 > > But real write speed is very low > > # dd if=/dev/zero|pv|dd oflag=direct of=44444 bs=4k count=10k > 10240+0 records in1.5MiB/s] [ <=> ] > 10240+0 records out > 41943040 bytes (42 MB) copied, 25.9155 s, 1.6 MB/s > 40.1MiB 0:00:25 [1.55MiB/s] [ <=> ] > > # dd if=/dev/zero|pv|dd oflag=direct of=44444 bs=32k count=10k > 10240+0 records in0.5MiB/s] [ <=> ] > 10240+0 records out > 335544320 bytes (336 MB) copied, 28.2998 s, 11.9 MB/s > 320MiB 0:00:28 [11.3MiB/s] [ <=> ] So what happens if you continue increasing the 'bs' parameter? Is bs=1M nice and fast? John > > Do you know of root cause of low speed of write to FS? > > Thank you for help in advance!! > > -- > Best Regards, > Stanislav Butkeev > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com