Re: osd out vs crush reweight]

<DHilsbos@xxxxxxxxxxxxxx> · Tue, 21 Jul 2020 18:05:02 +0000

Marcel;

To answer your question, I don't see anything that would be keeping these PGs on the same node.  Someone with more knowledge of how the Crush rules are applied, and the code around these operations, would need to weigh in.

I am somewhat curious though; you define racks, and even rooms in your tree, but your failure domain is set to host.  Is that intentional?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
DHilsbos@xxxxxxxxxxxxxx 
www.PerformAir.com

-----Original Message-----
From: Marcel Kuiper [mailto:ceph@xxxxxxxx] 
Sent: Tuesday, July 21, 2020 10:14 AM
To: ceph-users@xxxxxxx
Cc: Dominic Hilsbos
Subject: Re:  Re: osd out vs crush reweight]

Dominic

The crush rule dump and tree are attached (hope that works). All pools use crush_rule 1

Marcel

> Marcel;
>
> Sorry, could also send the output of:
> ceph osd tree
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
>
>
> -----Original Message-----
> From: DHilsbos@xxxxxxxxxxxxxx [mailto:DHilsbos@xxxxxxxxxxxxxx]
> Sent: Tuesday, July 21, 2020 9:41 AM
> To: ceph@xxxxxxxx; ceph-users@xxxxxxx
> Subject:  Re: osd out vs crush reweight]
>
> Marcel;
>
> Thank you for the information.
>
> Could you send the output of:
> ceph osd crush rule dump
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
>
>
> -----Original Message-----
> From: Marcel Kuiper [mailto:ceph@xxxxxxxx]
> Sent: Tuesday, July 21, 2020 9:38 AM
> To: ceph-users@xxxxxxx
> Subject:  Re: osd out vs crush reweight]
>
>
> Hi Dominic,
>
> This cluster is running 14.2.8 (nautilus) There's 172 osds divided 
> over 19 nodes.
> There are currently 10 pools.
> All pools have 3 replica's of data
> There are 3968 PG's (the cluster is not yet fully in use. The number 
> of PGs is expected to grow)
>
> Marcel
>
>> Marcel;
>>
>> Short answer; yes, it might be expected behavior.
>>
>> PG placement is highly dependent on the cluster layout, and CRUSH rules.
>> So... Some clarifying questions.
>>
>> What version of Ceph are you running?
>> How many nodes do you have?
>> How many pools do you have, and what are their failure domains?
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> DHilsbos@xxxxxxxxxxxxxx
>> www.PerformAir.com
>>
>>
>> -----Original Message-----
>> From: Marcel Kuiper [mailto:ceph@xxxxxxxx]
>> Sent: Tuesday, July 21, 2020 6:52 AM
>> To: ceph-users@xxxxxxx
>> Subject:  osd out vs crush reweight
>>
>> Hi list,
>>
>> I ran a test with marking an osd out versus setting its crush weight 
>> to 0.
>> I compared to what osds pages were send. The crush map has 3 rooms. 
>> This is what happened.
>>
>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's 
>> were send to the following osds
>>
>> NR PG's   OSD
>>       2   1
>>       1   4
>>       1   5
>>       1   6
>>       1   7
>>       2   8
>>       1   31
>>       1   34
>>       1   35
>>       1   56
>>       2   57
>>       1   58
>>       1   61
>>       1   83
>>       1   84
>>       1   88
>>       1   99
>>       1   100
>>       2   107
>>       1   114
>>       2   117
>>       1   118
>>       1   119
>>       1   121
>>
>> All PG's were send to osds on other nodes in the same room, except 
>> for 1 PG on osd 114. I think this works as expected
>>
>> Now I  marked the osd in and wait until all stabilized. Then I set 
>> the crush weight to 0. ceph osd crush reweight osd.111 0. I thought 
>> this lowers the crush weight of the node so even less chances that 
>> PG's end up on an osd of the same node. However the result are
>>
>> NR PG's   OSD
>>       1   61
>>       1   83
>>       1   86
>>       3   108
>>       4   109
>>       5   110
>>       2   112
>>       5   113
>>       7   114
>>       5   115
>>       2   116
>>
>> except for 3 PG's all other PG's ended up on an osd belonging to the 
>> same node :-O. Is this expected behaviour? Can someone explain?? This 
>> is on nautilus 14.2.8.
>>
>> Thanks
>>
>> Marcel
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
>> email to ceph-users-leave@xxxxxxx 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
>> email to ceph-users-leave@xxxxxxx
>>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx