Re: Ceph EC PG calculation

Frank Schilder <frans@xxxxxx> · Wed, 18 Nov 2020 08:11:20 +0000

Roughly speaking, if you have N OSDs, a replication factor of R and aim for P PGs/OSD on average, you can assign (N*P)/R PGs to the pool.

Example: 4+2 EC has replication 6. There are 36 OSDs. If you want to place, say,  50 PGs per OSD, you can assign

(36*50)/6=300 PGs

to the EC pool. You may pick a close power of 2 if you wish and then calculate how many PGs will be placed on each OSD on average. For example, we choose 256 PGs, then

256*6/36 = 42.7 PGs per OSD will be added.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Sent: 18 November 2020 04:58:38
To: ceph-users@xxxxxxx
Subject:  Ceph EC PG calculation

Hi,

I have this error:
I have 36 osd and get this:
Error ERANGE:  pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 10500 (mon_max_pg_per_osd 250 * num_in_osds 42)

If I want to calculate the max pg in my server, how it works if I have EC pool?

I have 4:2 data EC pool, and the others are replicated.

These are the pools:
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 597 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 598 flags hashpspool stripe_width 0 application rgw
pool 6 'sin.rgw.log' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 599 flags hashpspool stripe_width 0 application rgw
pool 7 'sin.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 600 flags hashpspool stripe_width 0 application rgw
pool 8 'sin.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 601 lfor 0/393/391 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 10 'sin.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 602 lfor 0/529/527 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 11 'sin.rgw.buckets.data.old' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 603 flags hashpspool stripe_width 0 application rgw
pool 12 'sin.rgw.buckets.data' erasure profile data-ec size 6 min_size 5 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 604 flags hashpspool,ec_overwrites stripe_width 16384 application rgw

So how I can calculate the pgs?

This is my osd tree:
ID   CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
-1         534.38354  root default
-5          89.06392      host cephosd-6s01
36   nvme    1.74660          osd.36                up   1.00000  1.00000
  0    ssd   14.55289          osd.0                 up   1.00000  1.00000
  8    ssd   14.55289          osd.8                 up   1.00000  1.00000
15    ssd   14.55289          osd.15                up   1.00000  1.00000
18    ssd   14.55289          osd.18                up   1.00000  1.00000
24    ssd   14.55289          osd.24                up   1.00000  1.00000
30    ssd   14.55289          osd.30                up   1.00000  1.00000
-3          89.06392      host cephosd-6s02
37   nvme    1.74660          osd.37                up   1.00000  1.00000
  1    ssd   14.55289          osd.1                 up   1.00000  1.00000
11    ssd   14.55289          osd.11                up   1.00000  1.00000
17    ssd   14.55289          osd.17                up   1.00000  1.00000
23    ssd   14.55289          osd.23                up   1.00000  1.00000
28    ssd   14.55289          osd.28                up   1.00000  1.00000
35    ssd   14.55289          osd.35                up   1.00000  1.00000
-11          89.06392      host cephosd-6s03
41   nvme    1.74660          osd.41                up   1.00000  1.00000
  2    ssd   14.55289          osd.2                 up   1.00000  1.00000
  6    ssd   14.55289          osd.6                 up   1.00000  1.00000
13    ssd   14.55289          osd.13                up   1.00000  1.00000
19    ssd   14.55289          osd.19                up   1.00000  1.00000
26    ssd   14.55289          osd.26                up   1.00000  1.00000
32    ssd   14.55289          osd.32                up   1.00000  1.00000
-13          89.06392      host cephosd-6s04
38   nvme    1.74660          osd.38                up   1.00000  1.00000
  5    ssd   14.55289          osd.5                 up   1.00000  1.00000
  7    ssd   14.55289          osd.7                 up   1.00000  1.00000
14    ssd   14.55289          osd.14                up   1.00000  1.00000
20    ssd   14.55289          osd.20                up   1.00000  1.00000
25    ssd   14.55289          osd.25                up   1.00000  1.00000
31    ssd   14.55289          osd.31                up   1.00000  1.00000
-9          89.06392      host cephosd-6s05
40   nvme    1.74660          osd.40                up   1.00000  1.00000
  3    ssd   14.55289          osd.3                 up   1.00000  1.00000
10    ssd   14.55289          osd.10                up   1.00000  1.00000
12    ssd   14.55289          osd.12                up   1.00000  1.00000
21    ssd   14.55289          osd.21                up   1.00000  1.00000
29    ssd   14.55289          osd.29                up   1.00000  1.00000
33    ssd   14.55289          osd.33                up   1.00000  1.00000
-7          89.06392      host cephosd-6s06
39   nvme    1.74660          osd.39                up   1.00000  1.00000
  4    ssd   14.55289          osd.4                 up   1.00000  1.00000
  9    ssd   14.55289          osd.9                 up   1.00000  1.00000
16    ssd   14.55289          osd.16                up   1.00000  1.00000
22    ssd   14.55289          osd.22                up   1.00000  1.00000
27    ssd   14.55289          osd.27                up   1.00000  1.00000
34    ssd   14.55289          osd.34                up   1.00000  1.00000

This is the crush rules:
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "replicated_nvme",
        "ruleset": 1,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -21,
                "item_name": "default~nvme"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 2,
        "rule_name": "replicated_ssd",
        "ruleset": 2,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 3,
        "rule_name": "sin.rgw.buckets.data.new",
        "ruleset": 3,
        "type": 3,
        "min_size": 3,
        "max_size": 6,
        "steps": [
            {
                "op": "set_chooseleaf_tries",
                "num": 5
            },
            {
                "op": "set_choose_tries",
                "num": 100
            },
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_indep",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

So everything else rather than the data pool are on SSD and nvme with replica 3.
If I calculate the pg in the ec like 36osd*100/6=600 which means the max pg in the EC pool is 512?
But how this affect the SSD replica pools then?

This is the EC pool definition:
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Thank you in advance.

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx