PG Calculation query

nokia ceph <nokiacephusers@xxxxxxxxx> · Mon, 27 Mar 2017 13:01:39 +0530

Hello, 
We are facing some performance issue with rados bench marking on a 5 node cluster with PG num 4096 vs 8192.

As per the PG calculation  below is our specification

Size   OSD   % Data Targets PG count 
5 340 100 100 8192  
5 340 100 50 4096

With 8192 PG count we got good performance with 4096 compared to 8192

With PG count - 4096 -->>
====================

Filesize 256000 512000 1024000 2048000 4096000 12288000
Write Bandwidth MB/sec 1448.38 2503.98 3941.42 5354.7 5333.9 5271.16
Read Bandwidth MB/sec 2924.83 3417.9 4236.65 4469.4 4602.65 4584.6
WRITE Average Latency seconds 0.088355 0.102214 0.129855 0.191155 0.377685 1.13953
WRITE Maximum Latency  seconds 0.280164 0.485391 1.15953 13.5175 27.9876 86.3103
READ Average Latency seconds 0.0437188 0.0747644 0.120604 0.228535 0.436566 1.30415
READ Maximum Latency  seconds 1.13067 3.21548 2.99734 4.08429 9.0224 16.6047

Average IOPS..

#grep "op/s" cephio_0%.txt | awk 'NF { print $(NF - 1) }'| awk '{ total += $0 } END { print total/NR }'
7517.49  -->>

With PG count - 8192 -->>
====================

Filesize 256000 512000 1024000 2048000 4096000 12288000
Write Bandwidth MB/sec  534.749 1020.49 1864.58 3100.92 4717.23 5251.76
Read Bandwidth MB/sec  1615.56 2764.25 4061.55 4265.39 4229.38 4042.18
WRITE Average Latency seconds  0.239263 0.250769 0.27448 0.328981 0.427056 1.14352
WRITE Maximum Latency  seconds 9.21752 10.3353 10.8132 11.2135 12.5497 44.8133
READ Average Latency seconds 0.0791822 0.0925167 0.12583 0.239571 0.475198 1.47916
READ Maximum Latency  seconds 2.01021 2.29139 3.60456 3.8435 7.43755 37.6106

#grep "op/s" cephio_0%.txt | awk 'NF { print $(NF - 1) }'| awk '{ total += $0 } END { print total/NR }'
4970.26

With 4096 PG - Average IOPS - 7517
With 8192 PG - Average IOPS - 4970

For smaller bits with 8192, the performance is badly affected. As per our test we are not adding any nodes in future. We mostly select 'Targets per OSD' as 100 instead of 200/300. 

Awaiting for comments to how to suit the best PG count as per the cluster size or how to choose appropriate PG count. 

ENV:-

Kraken - 11.2.0 - bluestore EC 4+1
RHEL 7.3
3.10.0-514.10.2.el7.x86_64
5 node - 5x68 - 340 OSD

Thanks

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Filesize	256000	512000	1024000	2048000	4096000	12288000
Write Bandwidth MB/sec	1448.38	2503.98	3941.42	5354.7	5333.9	5271.16
Read Bandwidth MB/sec	2924.83	3417.9	4236.65	4469.4	4602.65	4584.6
WRITE Average Latency seconds	0.088355	0.102214	0.129855	0.191155	0.377685	1.13953
WRITE Maximum Latency seconds	0.280164	0.485391	1.15953	13.5175	27.9876	86.3103
READ Average Latency seconds	0.0437188	0.0747644	0.120604	0.228535	0.436566	1.30415
READ Maximum Latency seconds	1.13067	3.21548	2.99734	4.08429	9.0224	16.6047

Filesize	256000	512000	1024000	2048000	4096000	12288000
Write Bandwidth MB/sec	534.749	1020.49	1864.58	3100.92	4717.23	5251.76
Read Bandwidth MB/sec	1615.56	2764.25	4061.55	4265.39	4229.38	4042.18
WRITE Average Latency seconds	0.239263	0.250769	0.27448	0.328981	0.427056	1.14352
WRITE Maximum Latency seconds	9.21752	10.3353	10.8132	11.2135	12.5497	44.8133
READ Average Latency seconds	0.0791822	0.0925167	0.12583	0.239571	0.475198	1.47916
READ Maximum Latency seconds	2.01021	2.29139	3.60456	3.8435	7.43755	37.6106