Hi,
So I have changed merge & split settings to
filestore_merge_threshold = 40
filestore_split_multiple = 8
Let me ask a question, although the pool default.rgw.buckets.data that was affected prior to the above change has higher write bandwidth it is very random now. Writes are random for other pools (same for EC and replicated types) too, before the change writes to replicated pools were much more stable.
Reads from pools look fine and stable.
Is it the result of mentioned change ? Is PG directory structure updating or ...?
# rados bench -p default.rgw.buckets.data 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194432 bytes to objects of size 4194432 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_19744
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 169 153 611.982 612.019 0.0547835 0.096198
2 16 314 298 595.958 580.018 0.0485904 0.103653
3 16 454 438 583.956 560.017 0.0747779 0.106501
4 16 556 540 539.96 408.012 0.111734 0.110132
5 16 624 608 486.364 272.008 0.111169 0.120221
6 16 651 635 423.302 108.003 0.0706783 0.126086
7 16 730 714 407.97 316.01 0.362838 0.153117
8 16 802 786 392.97 288.009 0.0261249 0.154728
9 16 858 842 374.194 224.007 0.0703766 0.159723
10 16 913 897 358.773 220.007 0.169544 0.173459
Total time run: 10.386646
Total writes made: 913
Write size: 4194432
Object size: 4194432
Bandwidth (MB/sec): 351.616
Stddev Bandwidth: 173.421
Max bandwidth (MB/sec): 612.019
Min bandwidth (MB/sec): 108.003
Average IOPS: 87
Stddev IOPS: 43
Max IOPS: 153
Min IOPS: 27
Average Latency(s): 0.179969
Stddev Latency(s): 0.321098
Max latency(s): 2.60469
Min latency(s): 0.0209669
# rados bench -p default.rgw.buckets.data 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 470 455 1818.75 1820.06 0.0462129 0.0337452
eTotal time run: 1.960388
Total reads made: 913
Read size: 4194432
Object size: 4194432
Bandwidth (MB/sec): 1862.95
Average IOPS: 465
Stddev IOPS: 0
Max IOPS: 455
Min IOPS: 455
Average Latency(s): 0.0335775
Max latency(s): 0.259385
Min latency(s): 0.0169049
#
# rados bench -p default.rgw.buckets.data 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 476 461 1843.81 1844.06 0.0272928 0.0335396
2 15 954 939 1877.83 1912.06 0.0434087 0.0332626
3 16 1436 1420 1893.16 1924.06 0.0408819 0.0330526
4 15 1925 1910 1909.84 1960.06 0.0259839 0.0328248
5 15 2408 2393 1914.25 1932.06 0.0427577 0.0328096
6 16 2891 2875 1916.52 1928.06 0.0273633 0.0327565
7 15 3377 3362 1921 1948.06 0.0294795 0.032736
8 16 3875 3859 1929.36 1988.06 0.0242193 0.03258
9 16 4351 4335 1926.53 1904.06 0.0203889 0.0326336
10 16 4855 4839 1935.46 2016.06 0.0325821 0.0324852
Total time run: 10.032589
Total reads made: 4856
Read size: 4194432
Object size: 4194432
Bandwidth (MB/sec): 1936.15
Average IOPS: 484
Stddev IOPS: 11
Max IOPS: 504
Min IOPS: 461
Average Latency(s): 0.0325118
Max latency(s): 0.250729
Min latency(s): 0.0149525
#
# rados bench -p benchmark_erasure_coded 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4202496 bytes to objects of size 4202496 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_20674
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 400 384 1538.92 1539 0.0234887 0.0401751
2 16 824 808 1618.98 1699.31 0.0215378 0.0390947
3 16 1212 1196 1597.6 1555.03 0.0169675 0.0394463
4 16 1551 1535 1537.81 1358.65 0.0605608 0.0413843
5 16 1947 1931 1547.63 1587.09 0.0183851 0.0409588
6 16 2099 2083 1391.21 609.188 0.014569 0.0401053
7 16 2174 2158 1235.4 300.586 0.019966 0.0482541
8 16 2551 2535 1269.82 1510.95 0.0226075 0.0503494
9 16 2727 2711 1207.09 705.375 0.0168164 0.0486592
10 16 3015 2999 1201.79 1154.25 0.0242785 0.0531814
Total time run: 10.038395
Total writes made: 3015
Write size: 4202496
Object size: 4202496
Bandwidth (MB/sec): 1203.73
Stddev Bandwidth: 490.656
Max bandwidth (MB/sec): 1699.31
Min bandwidth (MB/sec): 300.586
Average IOPS: 300
Stddev IOPS: 122
Max IOPS: 424
Min IOPS: 75
Average Latency(s): 0.0532326
Stddev Latency(s): 0.145877
Max latency(s): 1.92627
Min latency(s): 0.0125254
#
# rados bench -p benchmark_erasure_coded 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 432 417 1670.33 1671.26 0.0846208 0.0365352
2 16 856 840 1682.65 1695.3 0.0317332 0.0368908
3 16 1260 1244 1661.41 1619.16 0.0143683 0.0376721
4 16 1657 1641 1643.78 1591.1 0.0307071 0.0381582
5 16 2083 2067 1656.44 1707.33 0.0108411 0.0379383
6 15 2492 2477 1654.09 1643.2 0.0612322 0.03794
7 16 2897 2881 1649.07 1619.16 0.0646332 0.038115
Total time run: 7.352938
Total reads made: 3015
Read size: 4202496
Object size: 4202496
Bandwidth (MB/sec): 1643.36
Average IOPS: 410
Stddev IOPS: 10
Max IOPS: 426
Min IOPS: 397
Average Latency(s): 0.0383485
Max latency(s): 0.232213
Min latency(s): 0.0086436
#
# rados bench -p benchmark_erasure_coded 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 443 428 1714.95 1715.34 0.149024 0.0358012
2 15 871 856 1715 1715.34 0.0134417 0.0360376
3 16 1314 1298 1733.73 1771.45 0.0096009 0.0359534
4 15 1739 1724 1727.07 1707.33 0.0186676 0.0362362
5 16 2146 2130 1707.04 1627.17 0.0320369 0.0367355
6 16 2577 2561 1710.39 1727.37 0.0112118 0.036779
7 16 2998 2982 1707.06 1687.29 0.0110222 0.0367852
8 16 3477 3461 1733.61 1919.74 0.0420042 0.0362533
9 16 3897 3881 1727.99 1683.28 0.0124048 0.0363658
10 16 4320 4304 1724.7 1695.3 0.011312 0.0364624
Total time run: 10.043213
Total reads made: 4320
Read size: 4202496
Object size: 4202496
Bandwidth (MB/sec): 1723.93
Average IOPS: 430
Stddev IOPS: 19
Max IOPS: 479
Min IOPS: 406
Average Latency(s): 0.0365779
Max latency(s): 0.261305
Min latency(s): 0.00883435
#
# rados bench -p volumes 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_21005
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 324 308 1231.94 1232 0.0390295 0.049679
2 16 612 596 1191.88 1152 0.0245875 0.0527361
3 16 874 858 1143.87 1048 0.0223238 0.0553406
4 16 1144 1128 1127.86 1080 0.0715606 0.0558464
5 16 1388 1372 1097.46 976 0.0279251 0.0571859
6 16 1615 1599 1065.87 908 0.159919 0.0584934
7 16 1849 1833 1047.3 936 0.0167605 0.0601535
8 16 2034 2018 1008.87 740 0.0438302 0.0628943
9 16 2265 2249 999.427 924 0.035679 0.0632574
10 16 2499 2483 993.071 936 0.0244276 0.0640876
Total time run: 10.140100
Total writes made: 2500
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 986.184
Stddev Bandwidth: 139.753
Max bandwidth (MB/sec): 1232
Min bandwidth (MB/sec): 740
Average IOPS: 246
Stddev IOPS: 34
Max IOPS: 308
Min IOPS: 185
Average Latency(s): 0.0645551
Stddev Latency(s): 0.0796484
Max latency(s): 0.860657
Min latency(s): 0.012328
#
# rados bench -p volumes 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 520 505 2018.81 2020 0.12669 0.0305952
2 16 1021 1005 2009.25 2000 0.0270969 0.0310014
3 16 1527 1511 2014.08 2024 0.0567405 0.0307781
4 16 2038 2022 2021.5 2044 0.0113705 0.0307111
Total time run: 4.929016
Total reads made: 2500
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2028.8
Average IOPS: 507
Stddev IOPS: 4
Max IOPS: 511
Min IOPS: 500
Average Latency(s): 0.0308263
Max latency(s): 0.252516
Min latency(s): 0.00706628
#
# rados bench -p volumes 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 535 519 2075.65 2076 0.0126651 0.029495
2 16 1035 1019 2037.64 2000 0.0522776 0.0302652
3 16 1570 1554 2071.65 2140 0.0255347 0.0301478
4 16 2092 2076 2075.66 2088 0.0201707 0.0301096
5 15 2627 2612 2089.27 2144 0.0361379 0.0299093
6 16 3148 3132 2087.69 2080 0.0506905 0.029938
7 16 3660 3644 2081.99 2048 0.0157957 0.0299908
8 16 4178 4162 2080.71 2072 0.0105906 0.0300153
9 16 4718 4702 2089.49 2160 0.0144684 0.0299537
10 16 5257 5241 2096.11 2156 0.0157336 0.029869
Total time run: 10.028729
Total reads made: 5257
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2096.78
Average IOPS: 524
Stddev IOPS: 13
Max IOPS: 540
Min IOPS: 500
Average Latency(s): 0.0298879
Max latency(s): 0.509513
Min latency(s): 0.00704302
#
# rados bench -p benchmark_replicated 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_21089
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 218 202 807.933 808 0.0375772 0.0725827
2 15 418 403 805.897 804 0.0287712 0.0739311
3 16 585 569 758.578 664 0.0184545 0.0799763
4 16 766 750 749.915 724 0.0258367 0.078744
5 16 975 959 767.115 836 0.871661 0.0792439
6 16 1129 1113 741.915 616 0.0412471 0.0793611
7 16 1276 1260 719.916 588 0.0162774 0.0804685
8 15 1502 1487 743.415 908 0.0186694 0.0778871
9 16 1687 1671 742.583 736 0.0158112 0.0790996
10 16 1832 1816 726.316 580 1.83527 0.07856
11 15 1832 1817 660.652 4 0.120861 0.0785833
12 15 1832 1817 605.599 0 - 0.0785833
13 15 1832 1817 559.015 0 - 0.0785833
14 15 1832 1817 519.085 0 - 0.0785833
15 15 1832 1817 484.48 0 - 0.0785833
16 15 1832 1817 454.201 0 - 0.0785833
17 8 1832 1824 429.13 4.66667 8.16068 0.109725
Total time run: 17.755495
Total writes made: 1832
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 412.717
Stddev Bandwidth: 377.473
Max bandwidth (MB/sec): 908
Min bandwidth (MB/sec): 0
Average IOPS: 103
Stddev IOPS: 94
Max IOPS: 227
Min IOPS: 0
Average Latency(s): 0.146787
Stddev Latency(s): 0.789571
Max latency(s): 13.2518
Min latency(s): 0.0117266
#
# rados bench -p benchmark_replicated 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 496 480 1919.58 1920 0.0111955 0.0289794
2 15 979 964 1927.46 1936 0.0204766 0.0294553
3 16 1425 1409 1878.23 1780 0.0209146 0.027708
4 7 1832 1825 1824.63 1664 0.0286691 0.0330634
5 7 1832 1825 1459.73 0 - 0.0330634
Total time run: 5.593011
Total reads made: 1832
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1310.21
Average IOPS: 327
Stddev IOPS: 205
Max IOPS: 484
Min IOPS: 0
Average Latency(s): 0.0399277
Max latency(s): 2.54257
Min latency(s): 0.00632494
#
# rados bench -p benchmark_replicated 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 515 500 1999.26 2000 0.0260058 0.0289045
2 16 1008 992 1983.45 1968 0.00914651 0.0297668
3 16 1512 1496 1994.23 2016 0.0163293 0.0309669
4 16 1996 1980 1979.6 1936 0.0123961 0.0313833
5 15 2486 2471 1976.43 1964 0.0318256 0.0312294
6 16 2992 2976 1983.64 2020 0.0346031 0.0313301
7 15 3498 3483 1989.94 2028 0.0119796 0.0314029
8 16 4018 4002 2000.65 2076 0.0374133 0.0312428
9 16 4558 4542 2018.33 2160 0.024143 0.0308669
10 15 5101 5086 2034.07 2176 0.0317191 0.0307552
Total time run: 10.032364
Total reads made: 5101
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2033.82
Average IOPS: 508
Stddev IOPS: 20
Max IOPS: 544
Min IOPS: 484
Average Latency(s): 0.0307879
Max latency(s): 1.3466
Min latency(s): 0.00688148
#
Regards
Jakub
On Thu, Feb 1, 2018 at 3:33 PM, Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote:
Regarding split & merge, I have default valuesfilestore_merge_threshold = 10filestore_split_multiple = 2according to https://bugzilla.redhat.com/show_bug.cgi?id=1219974 the recommended values arefilestore_merge_threshold = 40filestore_split_multiple = 8Is it something that I can easily change to default or lower values than proposed in case of further performance degradation ?I did tests of 4 pools: 2 replicated pools (x3 ) and 2 EC pools (k=6,m=3)The pool with the lowest bandwidth has osd tree structure like├── 20.115s1_head│ └── DIR_5│ └── DIR_1│ ├── DIR_1│ │ ├── DIR_0│ │ ├── DIR_1│ │ ├── DIR_2│ │ │ ├── DIR_0│ │ │ ├── DIR_1│ │ │ ├── DIR_2│ │ │ ├── DIR_3│ │ │ ├── DIR_4│ │ │ ├── DIR_5│ │ │ ├── DIR_6│ │ │ ├── DIR_7│ │ │ ├── DIR_8│ │ │ ├── DIR_9│ │ │ ├── DIR_A│ │ │ ├── DIR_B│ │ │ ├── DIR_C│ │ │ ├── DIR_D│ │ │ ├── DIR_E│ │ │ └── DIR_FTests results# rados bench -p default.rgw.buckets.data 10 writehints = 1Maintaining 16 concurrent writes of 4194432 bytes to objects of size 4194432 for up to 10 seconds or 0 objectsObject prefix: benchmark_data_sg08-09_180679sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 129 113 451.975 452.014 0.0376714 0.1280272 16 209 193 385.964 320.01 0.119609 0.1385173 16 235 219 291.974 104.003 0.0337624 0.137314 16 235 219 218.981 0 - 0.137315 16 266 250 199.983 62.0019 0.111673 0.2384246 16 317 301 200.649 204.006 0.0340569 0.2984897 16 396 380 217.124 316.01 0.0379956 0.2834588 16 444 428 213.981 192.006 0.0304383 0.2741939 16 485 469 208.426 164.005 0.391956 0.28342110 16 496 480 191.983 44.0013 0.104497 0.29207411 16 497 481 174.894 4.00012 0.999985 0.29354512 16 497 481 160.32 0 - 0.29354513 16 497 481 147.987 0 - 0.29354514 16 497 481 137.417 0 - 0.293545Total time run: 14.493353Total writes made: 497Write size: 4194432Object size: 4194432Bandwidth (MB/sec): 137.171Stddev Bandwidth: 147.001Max bandwidth (MB/sec): 452.014Min bandwidth (MB/sec): 0Average IOPS: 34Stddev IOPS: 36Max IOPS: 113Min IOPS: 0Average Latency(s): 0.464281Stddev Latency(s): 1.09388Max latency(s): 6.3723Min latency(s): 0.023835Cleaning up (deleting benchmark objects)Removed 497 objectsClean up completed and total clean up time :10.622382## rados bench -p benchmark_erasure_coded 10 writehints = 1Maintaining 16 concurrent writes of 4202496 bytes to objects of size 4202496 for up to 10 seconds or 0 objectsObject prefix: benchmark_data_sg08-09_180807sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 424 408 1635.11 1635.19 0.0490434 0.03796162 16 828 812 1627.03 1619.16 0.0616501 0.03884673 16 1258 1242 1659.06 1723.36 0.0304412 0.03845374 16 1659 1643 1646.03 1607.13 0.0155402 0.03873515 16 2053 2037 1632.61 1579.08 0.0453354 0.03902366 16 2455 2439 1629 1611.14 0.0485313 0.03923767 16 2649 2633 1507.34 777.516 0.0148972 0.03931618 16 2858 2842 1423.61 837.633 0.0157639 0.04490889 16 3245 3229 1437.75 1551.02 0.0200845 0.044484710 16 3629 3613 1447.85 1539 0.0654451 0.0441569Total time run: 10.229591Total writes made: 3630Write size: 4202496Object size: 4202496Bandwidth (MB/sec): 1422.18Stddev Bandwidth: 341.609Max bandwidth (MB/sec): 1723.36Min bandwidth (MB/sec): 777.516Average IOPS: 354Stddev IOPS: 85Max IOPS: 430Min IOPS: 194Average Latency(s): 0.0448612Stddev Latency(s): 0.0712224Max latency(s): 1.08353Min latency(s): 0.0134629Cleaning up (deleting benchmark objects)Removed 3630 objectsClean up completed and total clean up time :2.321669## rados bench -p volumes 10 writehints = 1Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objectsObject prefix: benchmark_data_sg08-09_180651sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 336 320 1279.89 1280 0.0309006 0.04725242 16 653 637 1273.84 1268 0.0465151 0.04957013 16 956 940 1253.17 1212 0.0337327 0.05041464 16 1256 1240 1239.85 1200 0.0177263 0.05091455 16 1555 1539 1231.05 1196 0.0364991 0.05167246 16 1868 1852 1234.51 1252 0.0260964 0.05102367 16 2211 2195 1254.13 1372 0.040738 0.0508478 16 2493 2477 1238.35 1128 0.0228582 0.05149799 16 2838 2822 1254.07 1380 0.0265224 0.050864110 16 3116 3100 1239.85 1112 0.0160151 0.0513104Total time run: 10.192091Total writes made: 3117Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 1223.3Stddev Bandwidth: 89.9383Max bandwidth (MB/sec): 1380Min bandwidth (MB/sec): 1112Average IOPS: 305Stddev IOPS: 22Max IOPS: 345Min IOPS: 278Average Latency(s): 0.0518144Stddev Latency(s): 0.0529575Max latency(s): 0.663523Min latency(s): 0.0122169Cleaning up (deleting benchmark objects)Removed 3117 objectsClean up completed and total clean up time :0.212296## rados bench -p benchmark_replicated 10 writehints = 1Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objectsObject prefix: benchmark_data_sg08-09_180779sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 309 293 1171.94 1172 0.0233267 0.05088772 16 632 616 1231.87 1292 0.0258237 0.0496123 16 959 943 1257.19 1308 0.0335615 0.04834644 16 1276 1260 1259.85 1268 0.031461 0.05046895 16 1643 1627 1301.44 1468 0.0274032 0.04896516 16 1991 1975 1316.51 1392 0.0408116 0.04835967 16 2328 2312 1320.98 1348 0.0242298 0.0481758 16 2677 2661 1330.33 1396 0.097513 0.0479629 16 3042 3026 1344.72 1460 0.0196724 0.047407810 16 3384 3368 1347.03 1368 0.0426199 0.0472573Total time run: 10.482871Total writes made: 3384Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 1291.25Stddev Bandwidth: 90.4861Max bandwidth (MB/sec): 1468Min bandwidth (MB/sec): 1172Average IOPS: 322Stddev IOPS: 22Max IOPS: 367Min IOPS: 293Average Latency(s): 0.048763Stddev Latency(s): 0.0547666Max latency(s): 0.938211Min latency(s): 0.0121556Cleaning up (deleting benchmark objects)Removed 3384 objectsClean up completed and total clean up time :0.239684#Luis why did you advise against increasing pg_num pgp_num ? I'm wondering which option is better: increasing pg_num or filestore_merge_threshold and filestore_split_multiple ?ThanksJakubOn Thu, Feb 1, 2018 at 9:38 AM, Jaroslaw Owsiewski <jaroslaw.owsiewski@xxxxxxxxxx> wrote:Hi,maybe "split is on the floor"?Regards--Jarek
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com