ceph data replication not even on every osds

kaifeng@xxxxxxxxxxxxx (Kaifeng Yao) · Wed, 9 Jul 2014 02:10:20 +0000

It is not a crush map thing. What is the PG/OSD ratio? CEPH recommends 100-200 PG (after multiplying the replica number or EC stripe number) per OSD. But even though we also observed about 20-40% differences for PG/OSD distribution. You may try higher PG/OSD ratio but be warned that the messenger system may consume too much system resource.

A workaround is to reweight-by-utilization after the cluster has been filled to certain ratio. It means a lot of data movement and performance penalty to online traffic.

From: <Cao>, Buddy <buddy.cao@xxxxxxxxx<mailto:buddy.cao at intel.com>>
Date: Tuesday, July 1, 2014 at 11:52 PM
To: "ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>" <ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>>
Subject: ceph data replication not even on every osds

Hi,

I set the same weight for all the hosts, same weight for all the osds under the hosts in crushmap, and set pool replica size to 3. However, after upload 1M/4M/400M/900M files to the pool, I found the data replication is not even on every osds and the utilization for the osds are not the same, they are 25% to 70% respectively. Could you advice, it?s the nature of ceph, or there are some tricky setting in crushmap?

Rule r1 {
         ruleset 0
         type replicated
         min_size 0
         max_size 10
         step take root
         step chooseleaf firstn 0 type host
         step emit
}

Wei Cao (Buddy)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140709/09a9535d/attachment.htm>