Hi Dominiq I must say that I inherited this cluster and did not develop the cursh rule used. The rule reads: "rule_id": 1, "rule_name": "hdd", "ruleset": 1, "type": 1, "min_size": 2, "max_size": 3, "steps": [ { "op": "take", "item": -31, "item_name": "DC3" }, { "op": "choose_firstn", "num": 0, "type": "room" }, { "op": "chooseleaf_firstn", "num": 1, "type": "host" }, Doesn't that say it will choose DC3, then a room within DC3 and then a host? (I agree that racks in the tree are superfluous, but it does not harm either) Anyway thanks for your effort. I hope someone else can explain why setting the crushweight of an osd to 0 results in surprisingly much PG's going to other osd;s on the same node instead of going to other nodes Marcel > Marcel; > > To answer your question, I don't see anything that would be keeping these > PGs on the same node. Someone with more knowledge of how the Crush rules > are applied, and the code around these operations, would need to weigh in. > > I am somewhat curious though; you define racks, and even rooms in your > tree, but your failure domain is set to host. Is that intentional? > > Thank you, > > Dominic L. Hilsbos, MBA > Director - Information Technology > Perform Air International, Inc. > DHilsbos@xxxxxxxxxxxxxx > www.PerformAir.com > > > > -----Original Message----- > From: Marcel Kuiper [mailto:ceph@xxxxxxxx] > Sent: Tuesday, July 21, 2020 10:14 AM > To: ceph-users@xxxxxxx > Cc: Dominic Hilsbos > Subject: Re: Re: osd out vs crush reweight] > > Dominic > > The crush rule dump and tree are attached (hope that works). All pools use > crush_rule 1 > > Marcel > >> Marcel; >> >> Sorry, could also send the output of: >> ceph osd tree >> >> Thank you, >> >> Dominic L. Hilsbos, MBA >> Director - Information Technology >> Perform Air International, Inc. >> DHilsbos@xxxxxxxxxxxxxx >> www.PerformAir.com >> >> >> >> -----Original Message----- >> From: DHilsbos@xxxxxxxxxxxxxx [mailto:DHilsbos@xxxxxxxxxxxxxx] >> Sent: Tuesday, July 21, 2020 9:41 AM >> To: ceph@xxxxxxxx; ceph-users@xxxxxxx >> Subject: Re: osd out vs crush reweight] >> >> Marcel; >> >> Thank you for the information. >> >> Could you send the output of: >> ceph osd crush rule dump >> >> Thank you, >> >> Dominic L. Hilsbos, MBA >> Director - Information Technology >> Perform Air International, Inc. >> DHilsbos@xxxxxxxxxxxxxx >> www.PerformAir.com >> >> >> >> -----Original Message----- >> From: Marcel Kuiper [mailto:ceph@xxxxxxxx] >> Sent: Tuesday, July 21, 2020 9:38 AM >> To: ceph-users@xxxxxxx >> Subject: Re: osd out vs crush reweight] >> >> >> Hi Dominic, >> >> This cluster is running 14.2.8 (nautilus) There's 172 osds divided >> over 19 nodes. >> There are currently 10 pools. >> All pools have 3 replica's of data >> There are 3968 PG's (the cluster is not yet fully in use. The number >> of PGs is expected to grow) >> >> Marcel >> >>> Marcel; >>> >>> Short answer; yes, it might be expected behavior. >>> >>> PG placement is highly dependent on the cluster layout, and CRUSH >>> rules. >>> So... Some clarifying questions. >>> >>> What version of Ceph are you running? >>> How many nodes do you have? >>> How many pools do you have, and what are their failure domains? >>> >>> Thank you, >>> >>> Dominic L. Hilsbos, MBA >>> Director - Information Technology >>> Perform Air International, Inc. >>> DHilsbos@xxxxxxxxxxxxxx >>> www.PerformAir.com >>> >>> >>> -----Original Message----- >>> From: Marcel Kuiper [mailto:ceph@xxxxxxxx] >>> Sent: Tuesday, July 21, 2020 6:52 AM >>> To: ceph-users@xxxxxxx >>> Subject: osd out vs crush reweight >>> >>> Hi list, >>> >>> I ran a test with marking an osd out versus setting its crush weight >>> to 0. >>> I compared to what osds pages were send. The crush map has 3 rooms. >>> This is what happened. >>> >>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's >>> were send to the following osds >>> >>> NR PG's OSD >>> 2 1 >>> 1 4 >>> 1 5 >>> 1 6 >>> 1 7 >>> 2 8 >>> 1 31 >>> 1 34 >>> 1 35 >>> 1 56 >>> 2 57 >>> 1 58 >>> 1 61 >>> 1 83 >>> 1 84 >>> 1 88 >>> 1 99 >>> 1 100 >>> 2 107 >>> 1 114 >>> 2 117 >>> 1 118 >>> 1 119 >>> 1 121 >>> >>> All PG's were send to osds on other nodes in the same room, except >>> for 1 PG on osd 114. I think this works as expected >>> >>> Now I marked the osd in and wait until all stabilized. Then I set >>> the crush weight to 0. ceph osd crush reweight osd.111 0. I thought >>> this lowers the crush weight of the node so even less chances that >>> PG's end up on an osd of the same node. However the result are >>> >>> NR PG's OSD >>> 1 61 >>> 1 83 >>> 1 86 >>> 3 108 >>> 4 109 >>> 5 110 >>> 2 112 >>> 5 113 >>> 7 114 >>> 5 115 >>> 2 116 >>> >>> except for 3 PG's all other PG's ended up on an osd belonging to the >>> same node :-O. Is this expected behaviour? Can someone explain?? This >>> is on nautilus 14.2.8. >>> >>> Thanks >>> >>> Marcel >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >>> email to ceph-users-leave@xxxxxxx >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >>> email to ceph-users-leave@xxxxxxx >>> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >> email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >> email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >> email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx