It is not a crush map thing. What is the PG/OSD ratio? CEPH recommends 100-200 PG (after multiplying the replica number or EC stripe number) per OSD. But even though we also observed about 20-40% differences for PG/OSD distribution. You may try higher PG/OSD ratio but be warned that the messenger system may consume too much system resource. A workaround is to reweight-by-utilization after the cluster has been filled to certain ratio. It means a lot of data movement and performance penalty to online traffic. From: <Cao>, Buddy <buddy.cao@xxxxxxxxx<mailto:buddy.cao at intel.com>> Date: Tuesday, July 1, 2014 at 11:52 PM To: "ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>" <ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>> Subject: ceph data replication not even on every osds Hi, I set the same weight for all the hosts, same weight for all the osds under the hosts in crushmap, and set pool replica size to 3. However, after upload 1M/4M/400M/900M files to the pool, I found the data replication is not even on every osds and the utilization for the osds are not the same, they are 25% to 70% respectively. Could you advice, it?s the nature of ceph, or there are some tricky setting in crushmap? Rule r1 { ruleset 0 type replicated min_size 0 max_size 10 step take root step chooseleaf firstn 0 type host step emit } Wei Cao (Buddy) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140709/09a9535d/attachment.htm>