Hi, You've too many PG for too few OSD As the docs you linked said: When using multiple data pools for storing objects, you need to ensure that you balance the number of placement groups per pool with the number of placement groups per OSD so that you arrive at a reasonable total number of placement groups that provides reasonably low variance per OSD without taxing system resources or making the peering process too slow. For instance a cluster of 10 pools each with 512 placement groups on ten OSDs is a total of 5,120 placement groups spread over ten OSDs, that is 512 placement groups per OSD. That does not use too many resources. However, if 1,000 pools were created with 512 placement groups each, the OSDs will handle ~50,000 placement groups each and it would require significantly more resources and time for peering. So, remove useless pools or add OSDs On 06/05/2015 23:32, Chris Armstrong wrote: > Hi folks, > > Calling on the collective Ceph knowledge here. Since upgrading to > Hammer, we're now seeing: > > health HEALTH_WARN > too many PGs per OSD (1536 > max 300) > > We have 3 OSDs, so we have used the pg_num of 128 based on the > suggestion here: > http://ceph.com/docs/master/rados/operations/placement-groups/ > > We're also using the 12 default pools: > root@ca-deis-1:/# ceph osd lspools > 0 rbd,1 data,2 metadata,3 .rgw.root,4 .rgw.control,5 .rgw,6 .rgw.gc,7 > .users.uid,8 .users,9 .rgw.buckets.index,10 .rgw.buckets,11 > .rgw.buckets.extra, > > > Here's the output of ceph osd dump: > > root@ca-deis-1:/# ceph osd dump > epoch 46 > fsid 7bd27c76-f5f8-4eea-819b-379177929653 > created 2015-05-06 20:40:01.658764 > modified 2015-05-06 21:05:18.391730 > flags > pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 18 flags hashpspool > stripe_width 0 > pool 1 'data' replicated size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 11 flags hashpspool > crash_replay_interval 45 stripe_width 0 > pool 2 'metadata' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 10 flags > hashpspool stripe_width 0 > pool 3 '.rgw.root' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 20 flags > hashpspool stripe_width 0 > pool 4 '.rgw.control' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 22 flags > hashpspool stripe_width 0 > pool 5 '.rgw' replicated size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 24 flags hashpspool > stripe_width 0 > pool 6 '.rgw.gc' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 25 flags > hashpspool stripe_width 0 > pool 7 '.users.uid' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 26 flags > hashpspool stripe_width 0 > pool 8 '.users' replicated size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 28 flags hashpspool > stripe_width 0 > pool 9 '.rgw.buckets.index' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 30 flags > hashpspool stripe_width 0 > pool 10 '.rgw.buckets' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 35 flags > hashpspool stripe_width 0 > pool 11 '.rgw.buckets.extra' replicated size 3 min_size 1 crush_ruleset > 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 40 flags > hashpspool stripe_width 0 > max_osd 3 > osd.0 up in weight 1 up_from 4 up_thru 45 down_at 0 > last_clean_interval [0,0) 10.132.162.16:6800/1 > <http://10.132.162.16:6800/1> 10.132.162.16:6801/1 > <http://10.132.162.16:6801/1> 10.132.162.16:6802/1 > <http://10.132.162.16:6802/1> 10.132.162.16:6803/1 > <http://10.132.162.16:6803/1> exists,up d996b242-7fce-475f-a889-fa14038de180 > osd.1 up in weight 1 up_from 7 up_thru 45 down_at 0 > last_clean_interval [0,0) 10.132.253.121:6800/1 > <http://10.132.253.121:6800/1> 10.132.253.121:6801/1 > <http://10.132.253.121:6801/1> 10.132.253.121:6802/1 > <http://10.132.253.121:6802/1> 10.132.253.121:6803/1 > <http://10.132.253.121:6803/1> exists,up > 8ef7080d-ca37-4003-ae54-b76ddd13f752 > osd.2 up in weight 1 up_from 45 up_thru 45 down_at 43 > last_clean_interval [38,44) 10.132.253.118:6801/1 > <http://10.132.253.118:6801/1> 10.132.253.118:6805/1000001 > <http://10.132.253.118:6805/1000001> 10.132.253.118:6806/1000001 > <http://10.132.253.118:6806/1000001> 10.132.253.118:6807/1000001 > <http://10.132.253.118:6807/1000001> exists,up > 7b30f8aa-732b-4dca-bfbd-2dca9fb3c5ec > > Note that we have 3 replicas of our data (size 3) so that we can operate > with just one host up. > > We've seen performance issues before (especially during platform start), > which has me thinking - are we using too many placement groups given the > small number of OSDs and the fact that we're forcing each OSD to have a > full set of the data with size=3? Maybe the performance issues are to be > expected since we're pushing around so many PGs on startup. > > This logic has not changed since our use of firefly and giant, so I'm > not sure what changed. Some guidance is appreciated. > > Thanks! > > Chris > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com