On Mon, 25 Mar 2019, Rafał Wądołowski wrote:
> Hi,
>
> On one of our cluster (3400 OSD, ~25PB, 12.2.4), we incremented pg_num &
> pgp_num on one pool (EC 4+2) from 32k to 64k. After that cluster started
> to be instable for one hour, pgs were inactive (some activating, some
> peering).
>
> Any idea what bottlenecks we hit? Any ideas what should I change in
> configuration of ceph/os ?
Could be lots of things.
What does 'ceph tell <pgid> query' show for one of the activating or
peering pgs?
Note that you're moving ~half of hte data around in yoru cluster with that
change, so you will see each of those PGs cycle through backfill ->
peering -> activating -> active in the course of it moving.
sage
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com