troubleshooting ceph performance

Manuel Sopena Ballesteros <manuel.sb@xxxxxxxxxxxxx> · Wed, 31 Jan 2018 06:14:17 +0000

Dear Ceph community,

I have a very small ceph cluster for testing with this configuration:

·        
2x compute nodes each with:
·        
dual port of 25 nic
·        
2x socket (56 cores with hyperthreading)
·        
X10 intel nvme DC P3500 drives
·        
512 GB RAM

One of the nodes is also running as a monitor.
Installation has been done using ceph-ansible.

Ceph version: jewel
Storage engine: filestore

Performance test below:

[root@zeus-59 ceph-block-device]# ceph osd pool ls detail
pool 0 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 115 flags hashpspool stripe_width
 0
pool 1 'images' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 118 flags hashpspool stripe_width
 0
        removed_snaps [1~3,7~4]
pool 3 'backups' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 120 flags hashpspool stripe_width
 0
pool 4 'vms' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 122 flags hashpspool stripe_width
 0
        removed_snaps [1~7]
pool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 124 flags hashpspool stripe_width
 0
        removed_snaps [1~3]
pool 6 'scbench' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 126 flags hashpspool stripe_width
 0
pool 7 'rbdbench' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 128 flags hashpspool stripe_width
 0
        removed_snaps [1~3]

[root@zeus-59 ceph-block-device]# ceph osd tree
ID WEIGHT   TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 36.17371 root default
-2 18.08685     host zeus-58
0  1.80869         osd.0         up  1.00000          1.00000
2  1.80869         osd.2         up  1.00000          1.00000
4  1.80869         osd.4         up  1.00000          1.00000
6  1.80869         osd.6         up  1.00000          1.00000
8  1.80869         osd.8         up  1.00000          1.00000
10  1.80869         osd.10        up  1.00000          1.00000
12  1.80869         osd.12        up  1.00000          1.00000
14  1.80869         osd.14        up  1.00000          1.00000
16  1.80869         osd.16        up  1.00000          1.00000
18  1.80869         osd.18        up  1.00000          1.00000
-3 18.08685     host zeus-59
1  1.80869         osd.1         up  1.00000          1.00000
3  1.80869         osd.3         up  1.00000          1.00000
5  1.80869         osd.5         up  1.00000          1.00000
7  1.80869         osd.7         up  1.00000          1.00000
9  1.80869         osd.9         up  1.00000          1.00000
11  1.80869         osd.11        up  1.00000          1.00000
13  1.80869         osd.13        up  1.00000          1.00000
15  1.80869         osd.15        up  1.00000          1.00000
17  1.80869         osd.17        up  1.00000          1.00000
19  1.80869         osd.19        up  1.00000          1.00000
[root@zeus-59 ceph-block-device]# ceph status
    cluster 8e930b6c-455e-4328-872d-cb9f5c0359ae
     health HEALTH_OK
     monmap e1: 1 mons at {zeus-59=10.0.32.59:6789/0}
            election epoch 3, quorum 0 zeus-59
     osdmap e129: 20 osds: 20 up, 20 in
            flags sortbitwise,require_jewel_osds
      pgmap v1166945: 776 pgs, 7 pools, 1183 GB data, 296 kobjects
            2363 GB used, 34678 GB / 37042 GB avail
                 775 active+clean
                   1 active+clean+scrubbing+deep
[root@zeus-59 ceph-block-device]# rados bench -p scbench 10 write --no-cleanup
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_zeus-59.localdomain_2844050
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       644       628    2511.4      2512   0.0210273    0.025206
    2      16      1319      1303   2605.49      2700   0.0238678   0.0243974
    3      16      2003      1987   2648.89      2736   0.0201334   0.0240726
    4      16      2669      2653   2652.59      2664   0.0258618   0.0240468
    5      16      3349      3333   2666.01      2720   0.0189464   0.0239484
    6      16      4026      4010   2672.96      2708     0.02215   0.0238954
    7      16      4697      4681   2674.49      2684   0.0217258   0.0238887
    8      16      5358      5342   2670.64      2644   0.0265384   0.0239066
    9      16      6043      6027    2678.3      2740   0.0260798   0.0238637
   10      16      6731      6715   2685.64      2752   0.0174624   0.0237982
Total time run:         10.026091
Total writes made:      6731
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     2685.39
Stddev Bandwidth:       70.0286
Max bandwidth (MB/sec): 2752
Min bandwidth (MB/sec): 2512
Average IOPS:           671
Stddev IOPS:            17
Max IOPS:               688
Min IOPS:               628
Average Latency(s):     0.023819
Stddev Latency(s):      0.00463709
Max latency(s):         0.0594516
Min latency(s):         0.0138556
[root@zeus-59 ceph-block-device]# rados bench -p scbench 10 seq
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      15      1150      1135   4498.75      4540   0.0146433   0.0131456
    2      15      2313      2298   4571.38      4652   0.0144489   0.0131564
    3      15      3468      3453   4585.68      4620  0.00975626   0.0131211
    4      15      4663      4648   4633.41      4780   0.0163181   0.0130076
    5      15      5949      5934   4734.49      5144  0.00944718   0.0127327
Total time run:       5.643929
Total reads made:     6731
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   4770.43
Average IOPS          1192
Stddev IOPS:          59
Max IOPS:             1286
Min IOPS:             1135
Average Latency(s):   0.0126349
Max latency(s):       0.0490061
Min latency(s):       0.00613382
[root@zeus-59 ceph-block-device]# rados bench -p scbench 10 rand
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      15      1197      1182    4726.8      4728   0.0130331    0.012711
    2      15      2364      2349   4697.02      4668   0.0105971   0.0128123
    3      15      3686      3671   4893.78      5288  0.00906867   0.0123103
    4      15      4994      4979   4978.16      5232  0.00946901    0.012104
    5      15      6302      6287   5028.83      5232   0.0115159   0.0119879
    6      15      7620      7605   5069.28      5272  0.00986636   0.0118935
    7      15      8912      8897   5083.31      5168   0.0106201   0.0118648
    8      15     10185     10170   5084.34      5092   0.0116891   0.0118632
    9      15     11484     11469   5096.68      5196  0.00911787   0.0118354
   10      16     12748     12732   5092.16      5052   0.0111988   0.0118476
Total time run:       10.020135
Total reads made:     12748
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   5088.95
Average IOPS:         1272
Stddev IOPS:          55
Max IOPS:             1322
Min IOPS:             1167
Average Latency(s):   0.0118531
Max latency(s):       0.0441046
Min latency(s):       0.00590162
[root@zeus-59 ceph-block-device]# rbd bench-write image01 --pool=rbdbench
bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     56159  56180.51  230115361.66
    2    119975  59998.28  245752967.01
    3    182956  60990.78  249818235.33
    4    244195  61054.17  250077889.88
elapsed:     4  ops:   262144  ops/sec: 60006.56  bytes/sec: 245786880.86
[root@zeus-59 ceph-block-device]#

I am far from a ceph/storage expert but my feeling is that the numbers provided by rbd bench-write are quite poor considering the hardware I am using (please correct me if I am wrong).

I would like to ask for some help from the community in order to dig into this issue and find what is throttling the performance (cpu? Memory? Network configuration? Not enough data nodes? Not enough OSDs per disk? Cpu pinning? Etc.).

Apologies beforehand as I know this is a quite a broad topic and not easy to give an exact answer but I would like to have some guidance and hope we can make an interesting topic for performance troubleshooting for other people who is learning
 distributed storage and ceph.

Thank you very much

Manuel Sopena Ballesteros | Systems engineer

Garvan Institute of Medical Research 

The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010

T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel.sb@xxxxxxxxxxxxx

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are
 not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution
 of viruses or similar in electronic communications. This notice should not be removed.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com