Re: Weird behavior of PG distribution

"Chen, Ching-Cheng (KFRM 1)" <chingcheng.chen@xxxxxxxxxxxxxxxxx> · Tue, 1 Oct 2013 19:10:29 +0000

Aaron:

Bingo!

All my 5 VMs are exactly same setup so I didn’t bother with weight setting.   Thinking all 0.000 will they will be treat equally.

After following your suggestion put some numbers (I made them all 0.200) and the I got expected behavior.

Really appreciated,

Ching-Cheng Chen
MDS - New York
+1 212 538 8031 (*106 8031)

From: Aaron Ten Clay [mailto:aarontc@xxxxxxxxxxx]

Sent: Tuesday, October 01, 2013 2:53 PM

To: Chen, Ching-Cheng (KFRM 1)

Cc: Mike Dawson; ceph-users@xxxxxxxxxxxxxx

Subject: Re:  Weird behavior of PG distribution

On Tue, Oct 1, 2013 at 10:11 AM, Chen, Ching-Cheng (KFRM 1) <chingcheng.chen@xxxxxxxxxxxxxxxxx> wrote:
Mike:

Thanks for the reply.

However, I did the crushtool command but the output doesn't give me any obvious explanation why osd.4 should be the primary OSD for PGs.

All the rule has this "step chooseleaf firstn 0 type host".  According to Ceph document, PG should select two buckets from the host type.   And all OSD has same weight/type/etc etc.   Why would all PG choose osd.4 as primary OSD?

Here is the content of my crush map.

****************************************************************************************************

# begin crush map

# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2

device 3 osd.3

device 4 osd.4

# types

type 0 osd

type 1 host

type 2 rack

type 3 row

type 4 room

type 5 datacenter

type 6 root

# buckets

host gbl10134201 {

        id -2           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item osd.4 weight 0.000

}

host gbl10134202 {

        id -3           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item osd.1 weight 0.000

}

host gbl10134203 {

        id -4           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item osd.2 weight 0.000

}

host gbl10134214 {

        id -5           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item osd.3 weight 0.000

}

host gbl10134215 {

        id -6           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item osd.0 weight 0.000

}

root default {

        id -1           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item gbl10134201 weight 0.000

        item gbl10134202 weight 0.000

        item gbl10134203 weight 0.000

        item gbl10134214 weight 0.000

        item gbl10134215 weight 0.000

}

# rules

rule data {

        ruleset 0

        type replicated

        min_size 1

        max_size 10

        step take default

        step chooseleaf firstn 0 type host

        step emit

}

rule metadata {

        ruleset 1

        type replicated

        min_size 1

        max_size 10

        step take default

        step chooseleaf firstn 0 type host

        step emit

}

rule rbd {

        ruleset 2

        type replicated

        min_size 1

        max_size 10

        step take default

        step chooseleaf firstn 0 type host

        step emit

}

# end crush map

****************************************************************************************************

Regards,

Chen

You have a weight of 0 for all your OSDs... I bet that's confusing ceph just a bit. I believe the baseline recommendation is to use the size of the OSD in TiB as the starting weight for each OSD, and each level
 up the tree you use the combined weight of everything below it (although with a single OSD per host you don't need to worry about that yet.)

You might want to try e.g.

...

host gbl10134215 {

        id -6           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item osd.0 weight 3.630

}

root default {

        id -1           # do not change unnecessarily

        # weight 0.000

        alg straw

        hash 0  # rjenkins1

        item gbl10134201 weight 0.000

        item gbl10134202 weight 0.000

        item gbl10134203 weight 0.000

        item gbl10134214 weight 0.000

        item gbl10134215 weight 3.630

}

...

(If you have 4TB disks)

Once you edit the crush map, you compile it with 'crushtool -c <infile> <outfile>', then set that map active on the cluster with 'ceph osd setcrushmap -i <outfile>'.

You can read more at 
http://ceph.com/docs/master/rados/operations/crush-map/.

-Aaron

==============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==============================================================================

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com