Re: PG bottlenecks

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 25 Mar 2019 09:56:39 +0000 (UTC)

On Mon, 25 Mar 2019, Rafał Wądołowski wrote:
> Hi,
> 
> On one of our cluster (3400 OSD, ~25PB, 12.2.4), we incremented pg_num &
> pgp_num on one pool (EC 4+2) from 32k to 64k. After that cluster started
> to be instable for one hour, pgs were inactive (some activating, some
> peering).
> 
> Any idea what bottlenecks we hit? Any ideas what should I change in
> configuration of ceph/os ?

Could be lots of things. 

What does 'ceph tell <pgid> query' show for one of the activating or 
peering pgs?

Note that you're moving ~half of hte data around in yoru cluster with that 
change, so you will see each of those PGs cycle through backfill -> 
peering -> activating -> active in the course of it moving.

sage
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com