Re: PG bottlenecks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What does 'ceph osd blocked-by' show?

On Mon, 25 Mar 2019, Rafał Wądołowski wrote:

> This issue happened a week ago, so I don't have output from pg query.
> 
> Now on the test cluster I am observing similiar problems. Output from
> query in attachment.
> 
>   data:
>     pools:   5 pools, 32800 pgs
>     objects: 11.53E objects, 62.2GiB
>     usage:   176GiB used, 1.05TiB / 1.23TiB avail
>     pgs:     30.899% pgs not active
>              20193 active+clean
>              7573  activating+degraded
>              2525  activating
>              2460  active+recovery_wait+degraded
>              14    remapped+peering
>              11    down
>              5     activating+degraded+remapped
>              4     activating+remapped
>              3     active+recovery_wait+degraded+remapped
>              2     stale+active+clean
>              2     peering
>              2     active+clean+remapped
>              1     active+undersized+degraded
>              1     activating+undersized+degraded
>              1     active+recovery_wait+undersized+degraded
>              1     active+recovery_wait
>              1     active+recovering
>              1     active+recovering+degraded
> 
> pool 5 'test' erasure size 6 min_size 4 crush_rule 1 object_hash
> rjenkins pg_num 32768 pgp_num 32768 last_change 153 lfor 0/150 flags
> hashpspool stripe_width 16384 application rbd
> 
> It looks that cluster is blocked by something... This cluster is 12.2.11
> 
> 
> Best Regards,
> 
> Rafał Wądołowski
> 
> On 25.03.2019 10:56, Sage Weil wrote:
> > On Mon, 25 Mar 2019, Rafał Wądołowski wrote:
> >> Hi,
> >>
> >> On one of our cluster (3400 OSD, ~25PB, 12.2.4), we incremented pg_num &
> >> pgp_num on one pool (EC 4+2) from 32k to 64k. After that cluster started
> >> to be instable for one hour, pgs were inactive (some activating, some
> >> peering).
> >>
> >> Any idea what bottlenecks we hit? Any ideas what should I change in
> >> configuration of ceph/os ?
> > Could be lots of things. 
> >
> > What does 'ceph tell <pgid> query' show for one of the activating or 
> > peering pgs?
> >
> > Note that you're moving ~half of hte data around in yoru cluster with that 
> > change, so you will see each of those PGs cycle through backfill -> 
> > peering -> activating -> active in the course of it moving.
> >
> > sage
> 
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFS]

  Powered by Linux