Mike: Thanks for the reply. However, I did the crushtool command but the output doesn't give me any obvious explanation why osd.4 should be the primary OSD for PGs. All the rule has this "step chooseleaf firstn 0 type host". According to Ceph document, PG should select two buckets from the host type. And all OSD has same weight/type/etc etc. Why would all PG choose osd.4 as primary OSD? Here is the content of my crush map. **************************************************************************************************** # begin crush map # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 root # buckets host gbl10134201 { id -2 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item osd.4 weight 0.000 } host gbl10134202 { id -3 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item osd.1 weight 0.000 } host gbl10134203 { id -4 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item osd.2 weight 0.000 } host gbl10134214 { id -5 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item osd.3 weight 0.000 } host gbl10134215 { id -6 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item osd.0 weight 0.000 } root default { id -1 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item gbl10134201 weight 0.000 item gbl10134202 weight 0.000 item gbl10134203 weight 0.000 item gbl10134214 weight 0.000 item gbl10134215 weight 0.000 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map **************************************************************************************************** Regards, Chen -----Original Message----- From: Mike Dawson [mailto:mike.dawson@xxxxxxxxxxxx] Sent: Tuesday, October 01, 2013 11:31 AM To: Chen, Ching-Cheng (KFRM 1); ceph-users@xxxxxxxxxxxxxx Subject: Re: Weird behavior of PG distribution Ching-Cheng, Data placement is handled by CRUSH. Please examine the following: ceph osd getcrushmap -o crushmap && crushtool -d crushmap -o crushmap.txt && cat crushmap.txt That will show the topology and placement rules Ceph is using. Pay close attention to the "step chooseleaf" lines inside the rule for each pool. Under certain configurations, I believe the placement that you describe is in fact the expected behavior. Thanks, Mike Dawson Co-Founder, Cloudapt LLC On 10/1/2013 10:46 AM, Chen, Ching-Cheng (KFRM 1) wrote: > Found a weird behavior (or looks like weird) with ceph 0.67.3 > > I have 5 servers. Monitor runs on server 1. And server 2 to 5 have > one OSD running each (osd.0 - osd.3) > > I did a 'ceph pg dump'. I can see PGs got somehow randomly distributed > to all 4 OSDs which is expected behavior. > > However, if I bring up one OSD in the same server running monitor. It > seems all PGs has their primary ODS move to this new OSD. After I add a > new OSD (osd.4) to the same server running monitor, the 'ceph pg dump' > command showing active OSDs as [4,x] for all PGs. > > Is this expected behavior?? > > Regards, > > Chen > > Ching-Cheng Chen > > *CREDIT SUISSE* > > Information Technology | MDS - New York, KVBB 41 > > One Madison Avenue | 10010 New York | United States > > Phone +1 212 538 8031 | Mobile +1 732 216 7939 > > chingcheng.chen@xxxxxxxxxxxxxxxxx > <mailto:chingcheng.chen@xxxxxxxxxxxxxxxxx> | www.credit-suisse.com > <http://www.credit-suisse.com> > > > > ============================================================================== > Please access the attached hyperlink for an important electronic > communications disclaimer: > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > ============================================================================== > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html =============================================================================== _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com