Hello,
We are facing some performance issue with rados bench marking on a 5 node cluster with PG num 4096 vs 8192.
As per the PG calculation below is our specification
Size OSD % Data Targets PG count
5 | 340 | 100 | 100 | 8192 | |
5 | 340 | 100 | 50 | 4096 |
With 8192 PG count we got good performance with 4096 compared to 8192
With PG count - 4096 -->>
====================
Filesize | 256000 | 512000 | 1024000 | 2048000 | 4096000 | 12288000 |
---|---|---|---|---|---|---|
Write Bandwidth MB/sec | 1448.38 | 2503.98 | 3941.42 | 5354.7 | 5333.9 | 5271.16 |
Read Bandwidth MB/sec | 2924.83 | 3417.9 | 4236.65 | 4469.4 | 4602.65 | 4584.6 |
WRITE Average Latency seconds | 0.088355 | 0.102214 | 0.129855 | 0.191155 | 0.377685 | 1.13953 |
WRITE Maximum Latency seconds | 0.280164 | 0.485391 | 1.15953 | 13.5175 | 27.9876 | 86.3103 |
READ Average Latency seconds | 0.0437188 | 0.0747644 | 0.120604 | 0.228535 | 0.436566 | 1.30415 |
READ Maximum Latency seconds | 1.13067 | 3.21548 | 2.99734 | 4.08429 | 9.0224 | 16.6047 |
Average IOPS..
#grep "op/s" cephio_0%.txt | awk 'NF { print $(NF - 1) }'| awk '{ total += $0 } END { print total/NR }'
7517.49 -->>
With PG count - 8192 -->>
====================
Filesize | 256000 | 512000 | 1024000 | 2048000 | 4096000 | 12288000 |
---|---|---|---|---|---|---|
Write Bandwidth MB/sec | 534.749 | 1020.49 | 1864.58 | 3100.92 | 4717.23 | 5251.76 |
Read Bandwidth MB/sec | 1615.56 | 2764.25 | 4061.55 | 4265.39 | 4229.38 | 4042.18 |
WRITE Average Latency seconds | 0.239263 | 0.250769 | 0.27448 | 0.328981 | 0.427056 | 1.14352 |
WRITE Maximum Latency seconds | 9.21752 | 10.3353 | 10.8132 | 11.2135 | 12.5497 | 44.8133 |
READ Average Latency seconds | 0.0791822 | 0.0925167 | 0.12583 | 0.239571 | 0.475198 | 1.47916 |
READ Maximum Latency seconds | 2.01021 | 2.29139 | 3.60456 | 3.8435 | 7.43755 | 37.6106 |
#grep "op/s" cephio_0%.txt | awk 'NF { print $(NF - 1) }'| awk '{ total += $0 } END { print total/NR }'
4970.26
With 4096 PG - Average IOPS - 7517
With 8192 PG - Average IOPS - 4970
For smaller bits with 8192, the performance is badly affected. As per our test we are not adding any nodes in future. We mostly select 'Targets per OSD' as 100 instead of 200/300.
Awaiting for comments to how to suit the best PG count as per the cluster size or how to choose appropriate PG count.
ENV:-
Kraken - 11.2.0 - bluestore EC 4+1
RHEL 7.3
3.10.0-514.10.2.el7.x86_64
5 node - 5x68 - 340 OSD
Thanks
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com