Re: fixing a bad PG per OSD decision with pg-autoscaling?

"EDH - Manuel Rios Fernandez" <mriosfer@xxxxxxxxxxxxxxxx> · Wed, 21 Aug 2019 09:21:50 +0200

HI Nigel,

In Nautilus you can decrease PG , but it take weeks , for example for us to go from 4096 to 2048 took more than 2 weeks.

First at all pg-autoscaling is activable by pool. And you’re going to get a lot of warning , but it works.

Normally is recommended upgrade a cluster with HEALTH_OK state.

Also is recommended to use the unmap method the get the perfect distribution at balancer module, but it don’t work with misplaced/degraded error states.

From my poin of view I will try go Healthy , them upgrade.

Remember that you MUST repair all your SSD pre-nautilus due statistics scheme changed. 

Regards

Manuel

De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de Nigel Williams
Enviado el: miércoles, 21 de agosto de 2019 0:33
Para: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Asunto:  fixing a bad PG per OSD decision with pg-autoscaling?

Due to a gross miscalculation several years ago I set way too many PGs for our original Hammer cluster. We've lived with it ever since, but now we are on Luminous, changes result in stuck-requests and balancing problems. 

The cluster currently has 12% misplaced, and is grinding to re-balance but is unusable to clients (even with osd_max_pg_per_osd_hard_ratio set to 32, and mon_max_pg_per_osd set to 1000).

Can I safely press on upgrading to Nautilus in this state so I can enable the pg-autoscaling to finally fix the problem?

thanks.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com