Hi, I’m running a three server cluster (one monitor, two OSD) and am having a problem where after adding the second OSD server, my read rate drops significantly and eventually the reads stall (writes are improved as expected). Attached is a
log of the rados benchmarks for the two configurations and below is my hardware configuration. I’m not using replicas (capacity is more important than uptime for our use case) and am using a single 10GbE network. The pool (rbd) is configured with 128 placement
groups. I’ve checked the CPU utilization of the ceph-osd processes and they all hover around 10% until the stall. After the stall, the CPU usage is 0% and the disks all show zero operations via iostat. Iperf reports 9.9Gb/s between the monitor
and OSD servers. I’m looking for any advice/help on how to identify the source of this issue as my attempts so far have proven fruitless… Monitor server: 2x E5-2680V3 32GB DDR4 2x 4TB HDD in RAID1 on an Avago/LSI 3108 with Cachevault, configured as write-back 10GbE OSD servers: 2x E5-2680V3 128GB DDR4 2x 8+2 RAID6 using 8TB SAS12 drives on an Avago/LSI 9380 controller with Cachevault, configured as write-back. - Each RAID6 is an OSD 10GbE Thanks, Tom Helander KLA-Tencor
CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended
recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have
received this transmission in error, please immediately notify us by reply e-mail at
thomas.helander@xxxxxxxxxxxxxx or by telephone at (408) 875-7819, and destroy the original
transmission and its attachments without reading them or saving them to disk. Thank you. |
[root@control01 system_initialize]# cat /etc/ceph/ceph.conf [global] fsid = 9501b36d-0ac0-44ec-b740-9906d092bac2 mon initial members = control01 mon host = 192.168.2.11:6789 auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 192.168.2.0/23 osd crush chooseleaf type = 1 osd pool default min size = 1 osd pool default pg num = 128 osd pool default size = 1 public network = 192.168.2.0/23 [root@control01 system_initialize]# ceph -s cluster 9501b36d-0ac0-44ec-b740-9906d092bac2 health HEALTH_OK monmap e1: 1 mons at {control01=192.168.2.11:6789/0} election epoch 2, quorum 0 control01 osdmap e40: 2 osds: 2 up, 2 in flags sortbitwise pgmap v1826: 128 pgs, 1 pools, 0 bytes data, 0 objects 102 MB used, 116 TB / 116 TB avail 128 active+clean [root@control01 system_initialize]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 116.41660 root default -2 116.41660 host storage01 0 58.20830 osd.0 up 1.00000 1.00000 1 58.20830 osd.1 up 1.00000 1.00000 [root@control01 system_initialize]# rados -p rbd bench 60 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds Object prefix: benchmark_data_control01.adc_16939 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 180 164 655.951 656 0.0712128 0.0943403 2 16 323 307 613.933 572 0.0132957 0.0882554 3 16 421 405 539.946 392 0.0121464 0.105845 4 16 503 487 486.951 3280.00786829 0.117621 5 16 650 634 507.149 588 0.0344602 0.125399 6 16 730 714 475.952 320 0.0465193 0.133839 7 16 804 788 450.241 296 0.0704427 0.133889 8 16 952 936 467.954 592 0.103373 0.135459 9 16 1078 1062 471.956 504 0.172915 0.134309 10 16 1166 1150 459.956 352 0.0232164 0.136015 11 16 1282 1266 460.32 464 0.226732 0.137086 12 16 1378 1362 453.956 384 0.273416 0.140061 13 16 1504 1488 457.802 504 0.0169217 0.138367 14 16 1597 1581 451.671 372 0.0261604 0.140748 15 16 1693 1677 447.157 384 0.30309 0.142337 16 16 1774 1758 439.458 324 0.0165604 0.142555 17 16 1913 1897 446.31 556 0.0215074 0.141801 18 16 2001 1985 441.068 352 0.0326713 0.143268 19 16 2117 2101 442.272 464 0.0631854 0.144277 2016-08-01 08:46:56.636292 min lat: 0.00786046 max lat: 0.796235 avg lat: 0.144196 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 16 2213 2197 439.357 384 0.062006 0.144196 21 16 2338 2322 442.243 500 0.0504776 0.144552 22 16 2431 2415 439.049 372 0.145096 0.144634 23 16 2503 2487 432.48 288 0.0134286 0.147554 24 16 2626 2610 434.958 492 0.0166429 0.146283 25 16 2736 2720 435.158 440 0.158851 0.146813 26 16 2843 2827 434.881 428 0.178634 0.146422 27 16 2933 2917 432.106 360 0.0309586 0.147309 28 16 3034 3018 431.101 404 0.0252305 0.147202 29 16 3163 3147 434.027 516 0.0161699 0.146168 30 16 3267 3251 433.424 416 0.0226611 0.147184 31 16 3319 3303 426.152 208 0.0152707 0.149782 32 16 3425 3409 426.083 424 0.650789 0.148421 33 16 3583 3567 432.322 632 0.11366 0.147673 34 16 3679 3663 430.899 384 0.0202234 0.147264 35 16 3796 3780 431.958 468 0.0145311 0.147476 36 16 3889 3873 430.291 372 0.0385374 0.148281 37 16 4018 4002 432.607 516 0.333854 0.147036 38 16 4130 4114 433.011 448 0.0200742 0.147417 39 16 4240 4224 433.189 440 0.182814 0.147335 2016-08-01 08:47:16.638234 min lat: 0.00767328 max lat: 1.06087 avg lat: 0.147074 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 40 16 4350 4334 433.358 440 0.0622486 0.147074 41 16 4444 4428 431.958 376 0.320389 0.14782 42 16 4575 4559 434.148 524 0.0394409 0.14701 43 16 4665 4649 432.423 360 0.330111 0.147809 44 16 4760 4744 431.231 380 0.0339363 0.147463 45 16 4888 4872 433.024 512 0.0162028 0.147562 46 16 4975 4959 431.176 348 0.0587093 0.148274 47 16 5054 5038 428.725 316 0.142593 0.148945 48 16 5172 5156 429.625 472 0.0571224 0.148851 49 16 5246 5230 426.897 296 0.0103518 0.149479 50 16 5370 5354 428.278 496 0.0387389 0.148274 51 16 5485 5469 428.899 460 0.0471625 0.148994 52 16 5571 5555 427.266 344 0.286425 0.149251 53 16 5708 5692 429.543 548 0.0409842 0.148532 54 16 5747 5731 424.477 156 1.02458 0.149638 55 16 5896 5880 427.594 596 0.054289 0.14954 56 16 6003 5987 427.6 428 0.378443 0.149219 57 16 6130 6114 429.01 508 0.134211 0.149041 58 16 6203 6187 426.647 2920.00769267 0.149129 59 16 6327 6311 427.822 496 0.136888 0.149386 2016-08-01 08:47:36.640301 min lat: 0.00766789 max lat: 1.07989 avg lat: 0.14986 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 60 16 6410 6394 426.224 332 0.0289986 0.14986 Total time run: 60.224132 Total writes made: 6411 Write size: 4194304 Bandwidth (MB/sec): 426 Stddev Bandwidth: 102 Max bandwidth (MB/sec): 656 Min bandwidth (MB/sec): 156 Average IOPS: 106 Stddev IOPS: 25 Max IOPS: 164 Min IOPS: 39 Average Latency: 0.15 Stddev Latency: 0.148 Max latency: 1.08 Min latency: 0.00767 [root@control01 system_initialize]# ssh storage01 'echo 3 > /proc/sys/vm/drop_caches' [root@control01 system_initialize]# rados -p rbd bench 60 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 1 1 0 0 0 - 0 1 16 145 129 515.906 516 0.0167339 0.11488 2 16 287 271 541.922 568 0.0522996 0.114998 3 16 430 414 551.926 572 0.0524102 0.112196 4 16 582 566 565.93 608 0.0769078 0.110808 5 16 725 709 567.131 572 0.0392817 0.109597 6 16 879 863 575.265 616 0.139403 0.109623 7 16 1030 1014 579.361 604 0.0608292 0.108439 8 16 1176 1160 579.932 584 0.0363153 0.108604 9 16 1325 1309 581.71 596 0.0522732 0.108434 10 16 1461 1445 577.932 544 0.192011 0.109137 11 16 1618 1602 582.478 628 0.0856128 0.108291 12 16 1770 1754 584.6 608 0.0983951 0.10812 13 15 1925 1910 587.612 624 0.249809 0.107948 14 16 2074 2058 587.921 592 0.0328021 0.107558 15 15 2232 2217 591.099 636 0.0865074 0.107138 16 16 2391 2375 593.651 632 0.0583012 0.106681 17 16 2541 2525 594.021 600 0.0598446 0.10664 18 16 2689 2673 593.905 592 0.150494 0.106773 19 16 2829 2813 592.119 560 0.135322 0.107046 2016-08-01 08:49:13.364855 min lat: 0.00725 max lat: 0.474863 avg lat: 0.10682 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 16 2986 2970 593.91 628 0.140958 0.10682 21 16 3129 3113 592.864 572 0.0663896 0.106913 22 16 3292 3276 595.55 652 0.0777842 0.106593 23 16 3446 3430 596.436 616 0.0902701 0.106483 24 16 3582 3566 594.249 544 0.0267909 0.106545 25 16 3725 3709 593.358 572 0.202035 0.106958 26 16 3878 3862 594.073 612 0.075774 0.106808 27 16 4029 4013 594.439 604 0.040735 0.106731 28 16 4194 4178 596.778 660 0.0535983 0.106471 29 16 4346 4330 597.164 608 0.0733419 0.106291 30 16 4488 4472 596.19 568 0.0503257 0.106554 31 16 4636 4620 596.053 592 0.0621321 0.106497 32 16 4793 4777 597.05 628 0.148522 0.106465 33 15 4944 4929 597.376 608 0.0346483 0.106377 34 16 5099 5083 597.922 616 0.217152 0.106319 35 16 5251 5235 598.208 608 0.038321 0.106254 36 15 5395 5380 597.695 580 0.0631969 0.106301 37 16 5546 5530 597.756 600 0.307726 0.106317 38 16 5691 5675 597.288 580 0.0822542 0.106466 39 16 5848 5832 598.074 628 0.0899488 0.106331 2016-08-01 08:49:33.367138 min lat: 0.00675095 max lat: 0.474863 avg lat: 0.106496 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 40 16 5989 5973 597.221 564 0.139705 0.106496 41 16 6141 6125 597.482 608 0.116482 0.10645 42 16 6295 6279 597.922 616 0.158778 0.106378 Total time run: 42.887748 Total reads made: 6411 Read size: 4194304 Bandwidth (MB/sec): 598 Average IOPS 149 Stddev IOPS: 7 Max IOPS: 165 Min IOPS: 129 Average Latency: 0.106 Max latency: 0.475 Min latency: 0.00675 [root@control01 system_initialize]# rados -p rbd cleanup ### ADDED SECOND SERVER HERE [root@control01 system_initialize]# ceph -s cluster 9501b36d-0ac0-44ec-b740-9906d092bac2 health HEALTH_OK monmap e1: 1 mons at {control01=192.168.2.11:6789/0} election epoch 2, quorum 0 control01 osdmap e48: 4 osds: 4 up, 4 in flags sortbitwise pgmap v1881: 128 pgs, 1 pools, 0 bytes data, 0 objects 178 MB used, 232 TB / 232 TB avail 128 active+clean [root@control01 system_initialize]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 232.83319 root default -2 116.41660 host storage01 0 58.20830 osd.0 up 1.00000 1.00000 1 58.20830 osd.1 up 1.00000 1.00000 -3 116.41660 host storage02 2 58.20830 osd.2 up 1.00000 1.00000 3 58.20830 osd.3 up 1.00000 1.00000 [root@control01 system_initialize]# rados -p rbd bench 60 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds Object prefix: benchmark_data_control01.adc_19527 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 295 279 1115.89 1116 0.0914904 0.0553005 2 16 571 555 1109.87 1104 0.0793885 0.0558428 3 16 785 769 1025.21 856 0.0806213 0.0607349 4 15 940 925 924.889 624 0.0963824 0.0686032 5 16 1142 1126 900.693 804 0.0120744 0.0689868 6 16 1260 1244 829.236 472 0.559114 0.0733209 7 16 1518 1502 858.186 1032 0.060122 0.0739932 8 16 1685 1669 834.404 668 0.0257534 0.0762295 9 16 1907 1891 840.349 888 0.0351589 0.076069 10 16 2107 2091 836.305 800 0.0154241 0.0760561 11 16 2245 2229 810.451 5520.00834552 0.0780325 12 16 2462 2446 815.237 868 0.103543 0.0783068 13 16 2707 2691 827.903 980 0.0453284 0.0771251 14 16 2927 2911 831.618 880 0.0565513 0.0767363 15 16 3091 3075 819.907 656 0.0160208 0.0779284 16 16 3310 3294 823.406 876 0.154445 0.077495 17 16 3498 3482 819.2 752 0.0576891 0.0770361 18 16 3763 3747 832.571 1060 0.062403 0.0767189 19 16 3920 3904 821.802 628 0.0191069 0.0771242 2016-08-01 08:57:40.542756 min lat: 0.00750826 max lat: 0.73383 avg lat: 0.0784191 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 16 4092 4076 815.109 688 0.0379247 0.0784191 21 16 4343 4327 824.098 1004 0.0087996 0.0773914 22 16 4512 4496 817.363 676 0.0221131 0.0780507 23 16 4756 4740 824.256 976 0.035069 0.0775215 24 16 4920 4904 817.242 656 0.0079845 0.0774722 25 16 5107 5091 814.47 748 0.0561364 0.0784492 26 16 5278 5262 809.449 684 0.0123023 0.0787802 27 16 5449 5433 804.8 684 0.054684 0.0793674 28 16 5592 5576 796.484 5720.00835733 0.0791427 29 16 5695 5679 783.225 412 0.0182057 0.0816075 30 16 5953 5937 791.513 1032 0.188799 0.0806221 31 16 6173 6157 794.364 880 0.0728068 0.0797684 32 16 6413 6397 799.537 960 0.02882 0.0797985 33 16 6627 6611 801.245 856 0.0589031 0.07979 34 16 6835 6819 802.147 832 0.0112783 0.0794941 35 16 7003 6987 798.427 672 0.0374195 0.0800639 36 16 7198 7182 797.912 780 0.0740644 0.0801101 37 16 7414 7398 799.696 864 0.0614531 0.0799438 38 16 7638 7622 802.227 8960.00811641 0.0792574 39 15 7788 7773 797.143 604 0.0887847 0.0802261 2016-08-01 08:58:00.544911 min lat: 0.00748072 max lat: 1.24578 avg lat: 0.0794329 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 40 16 8036 8020 801.912 988 0.033339 0.0794329 41 16 8183 8167 796.693 588 0.0193636 0.0802842 42 16 8427 8411 800.96 976 0.0465702 0.0797765 43 16 8611 8595 799.447 736 0.0105132 0.0798697 44 16 8813 8797 799.639 808 0.0372847 0.0799672 45 16 9009 8993 799.289 784 0.30762 0.079848 46 16 9172 9156 796.086 652 0.0613054 0.0803336 47 16 9383 9367 797.103 844 0.0201026 0.0801689 48 16 9575 9559 796.495 768 0.0155817 0.0802785 49 16 9780 9764 796.973 820 0.013651 0.080104 50 16 9909 9893 791.352 516 0.0864205 0.0807965 51 16 10181 10165 797.167 1088 0.0426923 0.0802229 52 16 10378 10362 796.989 788 0.0719918 0.0802605 53 16 10626 10610 800.666 992 0.0485357 0.0798906 54 16 10767 10751 796.282 564 0.0108626 0.0802162 55 16 11004 10988 799.039 9480.00754213 0.0798239 56 16 11174 11158 796.912 680 0.0712432 0.0799379 57 16 11400 11384 798.79 904 0.0263897 0.0799824 58 16 11599 11583 798.74 796 0.0557034 0.0799678 59 16 11814 11798 799.777 860 0.0232177 0.0797825 2016-08-01 08:58:20.547054 min lat: 0.00739155 max lat: 1.24578 avg lat: 0.0798717 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 60 15 12020 12005 800.246 828 0.207976 0.0798717 Total time run: 60.052154 Total writes made: 12020 Write size: 4194304 Bandwidth (MB/sec): 801 Stddev Bandwidth: 165 Max bandwidth (MB/sec): 1.12e+03 Min bandwidth (MB/sec): 412 Average IOPS: 200 Stddev IOPS: 41 Max IOPS: 279 Min IOPS: 103 Average Latency: 0.0799 Stddev Latency: 0.104 Max latency: 1.25 Min latency: 0.00739 [root@control01 system_initialize]# ssh storage01 'echo 3 > /proc/sys/vm/drop_caches' [root@control01 system_initialize]# ssh storage02 'echo 3 > /proc/sys/vm/drop_caches' [root@control01 system_initialize]# rados -p rbd bench 60 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 121 105 419.921 420 0.0231405 0.110763 2 16 153 137 273.957 128 0.0425251 0.220774 3 16 193 177 235.967 160 0.994681 0.215014 4 15 258 243 242.954 264 0.789814 0.246963 5 15 314 299 239.137 224 0.0538332 0.254025 6 16 339 323 215.282 96 0.0373792 0.26461 7 16 369 353 201.67 120 0.0605347 0.289337 8 16 404 388 193.96 140 0.0139568 0.313728 9 16 434 418 185.742 120 0.214502 0.318417 10 16 467 451 180.366 132 0.673079 0.335801 11 16 493 477 173.423 104 0.241314 0.338439 12 16 499 483 160.972 24 0.0253943 0.335559 13 16 499 483 148.59 0 - 0.335559 14 16 499 483 137.977 0 - 0.335559 15 16 499 483 128.779 0 - 0.335559 16 16 499 483 120.731 0 - 0.335559 17 16 499 483 113.629 0 - 0.335559 18 16 499 483 107.317 0 - 0.335559 19 16 499 483 101.669 0 - 0.335559 2016-08-01 09:01:15.596789 min lat: 0.00710772 max lat: 1.71721 avg lat: 0.335559 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 16 499 483 96.5857 0 - 0.335559 21 16 499 483 91.9864 0 - 0.335559 22 16 499 483 87.8053 0 - 0.335559 23 16 499 483 83.9879 0 - 0.335559 24 16 499 483 80.4884 0 - 0.335559 25 16 499 483 77.269 0 - 0.335559 26 16 499 483 74.2975 0 - 0.335559 27 16 499 483 71.5458 0 - 0.335559 28 16 499 483 68.9906 0 - 0.335559 29 16 499 483 66.6117 0 - 0.335559 30 16 499 483 64.3914 0 - 0.335559 31 16 499 483 62.3143 0 - 0.335559 32 16 499 483 60.367 0 - 0.335559 33 16 499 483 58.5378 0 - 0.335559 34 16 499 483 56.8162 0 - 0.335559 35 16 499 483 55.1928 0 - 0.335559 36 16 499 483 53.6597 0 - 0.335559 37 16 499 483 52.2095 0 - 0.335559 38 16 499 483 50.8356 0 - 0.335559 39 16 499 483 49.5322 0 - 0.335559 2016-08-01 09:01:35.598904 min lat: 0.00710772 max lat: 1.71721 avg lat: 0.335559 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 40 16 499 483 48.2939 0 - 0.335559 41 16 499 483 47.116 0 - 0.335559 42 16 499 483 45.9942 0 - 0.335559 43 16 499 483 44.9246 0 - 0.335559 44 16 499 483 43.9036 0 - 0.335559 45 16 499 483 42.9279 0 - 0.335559 46 16 499 483 41.9948 0 - 0.335559 47 16 499 483 41.1013 0 - 0.335559 48 16 499 483 40.245 0 - 0.335559 49 16 499 483 39.4237 0 - 0.335559 50 16 499 483 38.6352 0 - 0.335559 51 16 499 483 37.8776 0 - 0.335559 52 16 499 483 37.1492 0 - 0.335559 53 16 499 483 36.4483 0 - 0.335559 54 16 499 483 35.7733 0 - 0.335559 55 16 499 483 35.1229 0 - 0.335559 56 16 499 483 34.4957 0 - 0.335559 57 16 499 483 33.8905 0 - 0.335559 58 16 499 483 33.3062 0 - 0.335559 59 16 499 483 32.7417 0 - 0.335559 2016-08-01 09:01:55.601246 min lat: 0.00710772 max lat: 1.71721 avg lat: 0.335559 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 60 16 499 483 32.196 0 - 0.335559 61 16 499 483 31.6682 0 - 0.335559 ^C [root@control01 system_initialize]# ceph -s cluster 9501b36d-0ac0-44ec-b740-9906d092bac2 health HEALTH_OK monmap e1: 1 mons at {control01=192.168.2.11:6789/0} election epoch 2, quorum 0 control01 osdmap e48: 4 osds: 4 up, 4 in flags sortbitwise pgmap v1915: 128 pgs, 1 pools, 48080 MB data, 12021 objects 48393 MB used, 232 TB / 232 TB avail 128 active+clean
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com