Re: Weird behavior of PG distribution

"Chen, Ching-Cheng (KFRM 1)" <chingcheng.chen@xxxxxxxxxxxxxxxxx> · Tue, 1 Oct 2013 17:11:22 +0000

Mike:

Thanks for the reply.

However, I did the crushtool command but the output doesn't give me any obvious explanation why osd.4 should be the primary OSD for PGs.

All the rule has this "step chooseleaf firstn 0 type host".  According to Ceph document, PG should select two buckets from the host type.   And all OSD has same weight/type/etc etc.   Why would all PG choose osd.4 as primary OSD?

Here is the content of my crush map.

****************************************************************************************************
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host gbl10134201 {
        id -2           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 0.000
}
host gbl10134202 {
        id -3           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.1 weight 0.000
}
host gbl10134203 {
        id -4           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 0.000
}
host gbl10134214 {
        id -5           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.3 weight 0.000
}
host gbl10134215 {
        id -6           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.000
}
root default {
        id -1           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item gbl10134201 weight 0.000
        item gbl10134202 weight 0.000
        item gbl10134203 weight 0.000
        item gbl10134214 weight 0.000
        item gbl10134215 weight 0.000
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule metadata {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

****************************************************************************************************
Regards,

Chen

-----Original Message-----
From: Mike Dawson [mailto:mike.dawson@xxxxxxxxxxxx] 
Sent: Tuesday, October 01, 2013 11:31 AM
To: Chen, Ching-Cheng (KFRM 1); ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Weird behavior of PG distribution

Ching-Cheng,

Data placement is handled by CRUSH. Please examine the following:

ceph osd getcrushmap -o crushmap && crushtool -d crushmap -o 
crushmap.txt && cat crushmap.txt

That will show the topology and placement rules Ceph is using.
Pay close attention to the "step chooseleaf" lines inside the rule for 
each pool. Under certain configurations, I believe the placement that 
you describe is in fact the expected behavior.

Thanks,

Mike Dawson
Co-Founder, Cloudapt LLC

On 10/1/2013 10:46 AM, Chen, Ching-Cheng (KFRM 1) wrote:
> Found a weird behavior (or looks like weird) with ceph 0.67.3
>
> I have 5 servers.  Monitor runs on server 1.   And server 2 to 5 have
> one OSD running each (osd.0 - osd.3)
>
> I did a 'ceph pg dump'.  I can see PGs got somehow randomly distributed
> to all 4 OSDs which is expected behavior.
>
> However, if I bring up one OSD in the same server running monitor.   It
> seems all PGs has their primary ODS move to this new OSD.  After I add a
> new OSD (osd.4) to the same server running monitor, the 'ceph pg dump'
> command showing active OSDs as [4,x] for all PGs.
>
> Is this expected behavior??
>
> Regards,
>
> Chen
>
> Ching-Cheng Chen
>
> *CREDIT SUISSE*
>
> Information Technology | MDS - New York, KVBB 41
>
> One Madison Avenue | 10010 New York | United States
>
> Phone +1 212 538 8031 | Mobile +1 732 216 7939
>
> chingcheng.chen@xxxxxxxxxxxxxxxxx
> <mailto:chingcheng.chen@xxxxxxxxxxxxxxxxx> | www.credit-suisse.com
> <http://www.credit-suisse.com>
>
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ==============================================================================
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com