Hi,
we are currently testing out ways to increase Ceph performance because
what we experience so far is very close to unusable.
For the test cluster we utilizing 4 nodes with the following hardware
data:
Dual 200GBe Mellanox Ethernet
2x EPYC Rome 7302
16x 32GB 3200MHz ECC
9x 15.36TB Micron 9300 Pro
For production this will be extended to all 8 nodes if it shows
promising results.
- Ceph was installed with cephadm.
- MDS and ODS are located on the same nodes.
- Mostly using stock config
- Network performance tested with iperf3 seems fine, 26Gbits/s with
-P4 on single port (details below).
Close to 200Gbits with 10 parallel instances and servers.
When testing a mounted CephFS on the working nodes in various
configurations I only got <50MB/s for fuse mount and <270MB/s for
kernel mounts. (dd command and output attached below)
In addition ceph dashboard and our graphana monitoring reports packet
loss on all relevant interfaces during load. Which does not occur
during the normal iperf load tests or rsync/scp file transfer.
Rados Bench shows performance around 2000MB/s which is not max
performance of the SSDs but fine for us (details below).
Why is the filesystem so slow compared to the individual components?
Cheers
Dominik
Test details:
------------------------------------------------------------------------------------------------------
Some tests done on working nodes:
Ceph mounted with ceph-fuse
root@ml2ran10:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M
count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4,3 GB, 4,0 GiB) copied, 88,2933 s, 48,6 MB/s
Ceph mounted with kernel driver:
root@ml2ran06:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M
count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.0989 s, 267 MB/s
Storage Node
With fuse
root@ml2rsn05:/mnt/ml2r_storage/backup# dd if=/dev/zero of=testfile
bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 53.9977 s, 79.5 MB/s
Kernel mount:
dd if=/dev/zero of=testfile bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.6726 s, 243 MB/s
_______________________________________________________
Iperf3
iperf3 --zerocopy -n 10240M -P4 -c ml2ran08s0 -p 4701 -i 15 -b
200000000000
Connecting to host ml2ran08s0, port 4701
[ 5] local 129.217.31.180 port 43958 connected to 129.217.31.218 port
4701
[ 7] local 129.217.31.180 port 43960 connected to 129.217.31.218 port
4701
[ 9] local 129.217.31.180 port 43962 connected to 129.217.31.218 port
4701
[ 11] local 129.217.31.180 port 43964 connected to 129.217.31.218 port
4701
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-3.21 sec 2.50 GBytes 6.69 Gbits/sec 0 632 KBytes
[ 7] 0.00-3.21 sec 2.50 GBytes 6.70 Gbits/sec 0 522 KBytes
[ 9] 0.00-3.21 sec 2.50 GBytes 6.69 Gbits/sec 0 612 KBytes
[ 11] 0.00-3.21 sec 2.50 GBytes 6.69 Gbits/sec 0 430 KBytes
[SUM] 0.00-3.21 sec 10.0 GBytes 26.8 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-3.21 sec 2.50 GBytes 6.69 Gbits/sec 0 sender
[ 5] 0.00-3.21 sec 2.50 GBytes 6.67 Gbits/sec
receiver
[ 7] 0.00-3.21 sec 2.50 GBytes 6.70 Gbits/sec 0 sender
[ 7] 0.00-3.21 sec 2.50 GBytes 6.67 Gbits/sec
receiver
[ 9] 0.00-3.21 sec 2.50 GBytes 6.69 Gbits/sec 0 sender
[ 9] 0.00-3.21 sec 2.49 GBytes 6.67 Gbits/sec
receiver
[ 11] 0.00-3.21 sec 2.50 GBytes 6.69 Gbits/sec 0 sender
[ 11] 0.00-3.21 sec 2.50 GBytes 6.67 Gbits/sec
receiver
[SUM] 0.00-3.21 sec 10.0 GBytes 26.8 Gbits/sec 0 sender
[SUM] 0.00-3.21 sec 9.98 GBytes 26.7 Gbits/sec
receiver
_________________________________________________________________
Rados Bench on storage node:
# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size
4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_ml2rsn05_2829244
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 0 0 0 0 0 - 0
1 16 747 731 2923.84 2924 0.0178757 0.0216262
2 16 1506 1490 2979.71 3036 0.0308664 0.0213685
3 16 2267 2251 3000.99 3044 0.0259053 0.0212556
4 16 3058 3042 3041.62 3164 0.0227621 0.0209792
5 16 3850 3834 3066.8 3168 0.0130519 0.0208148
6 16 4625 4609 3072.26 3100 0.151371 0.0207904
7 16 5381 5365 3065.28 3024 0.0300368 0.0208345
8 16 6172 6156 3077.57 3164 0.0197728 0.0207714
9 16 6971 6955 3090.67 3196 0.0142751 0.0206786
10 14 7772 7758 3102.76 3212 0.0181034 0.020605
Total time run: 10.0179
Total writes made: 7772
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 3103.23
Stddev Bandwidth: 93.3676
Max bandwidth (MB/sec): 3212
Min bandwidth (MB/sec): 2924
Average IOPS: 775
Stddev IOPS: 23.3419
Max IOPS: 803
Min IOPS: 731
Average Latency(s): 0.020598
Stddev Latency(s): 0.00731743
Max latency(s): 0.151371
Min latency(s): 0.00966991
# rados bench -p testbench 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 0 0 0 0 0 - 0
1 15 657 642 2567.32 2568 0.011104 0.022631
2 15 1244 1229 2456.9 2348 0.0115248 0.019485
3 16 1499 1483 1976.64 1016 0.00722983
0.0177887
4 15 1922 1907 1906.16 1696 0.0142242 0.0327382
5 15 2593 2578 2061.6 2684 0.011758 0.0301774
6 16 3142 3126 2083.23 2192 0.00915926
0.027478
7 16 3276 3260 1862.23 536 0.00824714
0.0267449
8 16 3606 3590 1794.43 1320 0.0118938 0.0350541
9 16 4293 4277 1900.32 2748 0.0301886 0.0330604
10 14 5003 4989 1995.04 2848 0.0389717 0.0314977
Total time run: 10.0227
Total reads made: 5003
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1996.67
Average IOPS: 499
Stddev IOPS: 202.3
Max IOPS: 712
Min IOPS: 134
Average Latency(s): 0.0314843
Max latency(s): 3.04463
Min latency(s): 0.00551523
# rados bench -p testbench 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 15 15 0 0 0 - 0
1 15 680 665 2657.61 2660 0.00919807
0.0224833
2 15 1273 1258 2514.26 2372 0.00839656
0.0247125
3 16 1863 1847 2461.4 2356 0.00994467
0.0236565
4 16 2064 2048 2047.14 804 0.00809139
0.0223506
5 16 2064 2048 1637.79 0 - 0.0223506
6 16 2477 2461 1640.12 826 0.0286315 0.0383254
7 16 3102 3086 1762.89 2500 0.0267464 0.0349189
8 16 3513 3497 1748 1644 0.00890952
0.032269
9 16 3617 3601 1600 416 0.00626917
0.0316019
10 15 4014 3999 1599.18 1592 0.0461076 0.0393606
Total time run: 10.0481
Total reads made: 4014
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1597.91
Average IOPS: 399
Stddev IOPS: 239.089
Max IOPS: 665
Min IOPS: 0
Average Latency(s): 0.0394035
Max latency(s): 3.00962
Min latency(s): 0.00449537
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx