Sorry missing the pg dump :
2.1 0 0 0 0 0 0 0 0 stale+peering 2018-07-26 19:38:13.381673 0'0 125:9 [3] 3 [3] 3 0'0 2018-07-26 15:20:08.965357 0'0 2018-07-26 15:20:08.965357 0
2.0 0 0 0 0 0 0 0 0 stale+peering 2018-07-26 19:38:13.345341 0'0 125:13 [3] 3 [3] 3 0'0 2018-07-26 15:20:08.965357 0'0 2018-07-26 15:20:08.965357 0
2 0 0 0 0 0 0 0 0
sum 0 0 0 0 0 0 0 0
OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
3 1051M 1861G 1863G [0,1,2] 256 256
2 1051M 1861G 1863G [0,1,3] 0 0
1 1051M 3724G 3726G [0,2,3] 0 0
0 1051M 1861G 1863G [1,2,3] 0 0
sum 4205M 9310G 9315G
For some reason it seems that some PG are allocated to osd 3 ( but stall + peering)
This is kind of odd
On Thu, 26 Jul 2018 at 20:50, Benoit Hudzia <benoit@xxxxxxxxxxxxxxx> wrote:
You are correct the PG are stale ( not allocated )[root@stratonode1 /]# ceph statuscluster:id: ea0df043-7b25-4447-a43d-e9b2af8fe069health: HEALTH_WARNReduced data availability: 256 pgs inactive, 256 pgs peering, 256 pgs staleservices:mon: 3 daemons, quorum stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.stratomgr: stratonode1(active), standbys: stratonode2, stratonode3osd: 4 osds: 4 up, 4 indata:pools: 1 pools, 256 pgsobjects: 0 objects, 0 bytesusage: 4192 MB used, 9310 GB / 9315 GB availpgs: 100.000% pgs not active256 stale+peeringPG dump : show all PG in stale + peeringHowever it s kind of strange it show some PG associated with OSD 3So it seems that PGcalc is not taking into account the ruleset .....Do you think that changing ""osd max pg per osd hard ratio"" to a huge number (1M) would be a valid temp workaround ?We always allocate pool with dedicated OSD using the device class rule set , so we never have pool sharing OSD .
I ll open a bug with ceph regarding pg creation check ignoring the crush ruleset.On Thu, 26 Jul 2018 at 17:11, John Spray <jspray@xxxxxxxxxx> wrote:On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <benoit@xxxxxxxxxxxxxxx> wrote:HI,We currently segregate ceph pool PG allocation using the crush device class ruleset as described: https://ceph.com/community/new-luminous-crush-device-classes/simply using the following command to define the rule : ceph osd crush rule create-replicated <RULE> default host <DEVICE CLASS>However, we noticed that the rule is not strict in certain scenarios. By that, I mean that if there is no OSD of the specific device class ceph will allocate PG for this pool to any other OSD available ( creating an issue with the PG calculation when we want to add new pool)Simple scenario :1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each . belonging to class <pool1>2. remove all OSD ( delete them )3. create 4 new OSD (using same disk but different ID) but this time tag them with class <pool2>4. Try to create pool <pool2> -> this will fail withthe pool creation will fail with output : Error ERANGE: pg_num 256 size 2 would mean 1024 total pgs, which exceeds max 800 (mon_max_pg_per_osd 200 * num_in_osds 4)"Pool1 simply started allocating PG to OSD that doesn't belong to the rulesetAre you sure pool 1's PGs are actually being placed on the wrong OSDs? Have you looked at the output of "ceph pg dump" to check that?It sounds more like the pool creation check is simply failing to consider the crush rules and applying a cruder global check.John_______________________________________________Which leads me to the following question: is there a way to make the crush rule a hard requirement. E.g : if we do not have any osd matching the device class , it won't start trying to allocate pg to OSD that doesn't match it?Is there any way to prevent pool 1 to use the OSD ?
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com