Re: EC 8+3 Pool PGs stuck in remapped+incomplete

Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> · Sat, 17 Jun 2023 21:51:52 +0530

Hello Anthony / Users,

After some initial analysis, I had increased max_pg_per_osd to 480, but
we're out of luck. Also tried force-backfill and force-repair as well.
On querying PG using *# ceph pg **<pg.ID> query* the output says blocked_by
3 to 4 OSDs which are out of the cluster already. Guessing if these have to
do something with the recovery.

Thanks,
Jayanth Reddy

On Sat, Jun 17, 2023 at 4:17 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
wrote:

> Your cluster’s configuration is preventing CRUSH from calculating full
> placements
>
> set max_pg_per_osd = 1000, either in central config (or ceph.conf if you
> have it set there now).
>
> If you have it set in ceph.conf, you may need to serially restart the mons.
>
> ceph osd down 214
> sleep 60
> ceph osd down 223
> sleep 60
> ceph osd down 548
> sleep 60
> ceph osd down 584
>
>
>
>
>
>
> > On Jun 17, 2023, at 2:22 AM, Jayanth Reddy <jayanthreddy5666@xxxxxxxxx>
> wrote:
> >
> > Hello Users,
> > Greetings. We've a Ceph Cluster with the version
> > *ceph version 14.2.5-382-g8881d33957
> > (8881d33957b54b101eae9c7627b351af10e87ee8) nautilus (stable)*
> >
> > 5 PGs belonging to our RGW 8+3 EC Pool are stuck in incomplete and
> > incomplete+remapped states. Below are the PGs,
> >
> > # ceph pg dump_stuck inactive
> > ok
> > PG_STAT STATE               UP
> > UP_PRIMARY ACTING
> >                 ACTING_PRIMARY
> > 15.251e          incomplete    [151,464,146,503,166,41,555,542,9,565,268]
> >     151
> > [151,464,146,503,166,41,555,542,9,565,268]            151
> > 15.3f3           incomplete [584,281,672,699,199,224,239,430,355,504,196]
> >     584
> > [584,281,672,699,199,224,239,430,355,504,196]            584
> > 15.985  remapped+incomplete  [396,690,493,214,319,209,546,91,599,237,352]
> >     396
> >
> [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352]
> >           214
> > 15.39d3 remapped+incomplete  [404,221,223,585,38,102,533,471,568,451,195]
> >     404
> > [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647]
> >         223
> > 15.d46  remapped+incomplete [297,646,212,254,110,169,500,372,623,470,678]
> >     297
> > [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678]
> >         548
> >
> > Some of the OSDs had gone down on the cluster. Below is the # ceph status
> >
> > # ceph -s
> >  cluster:
> >    id:     30d6f7ee-fa02-4ab3-8a09-9321c8002794
> >    health: HEALTH_WARN
> >            noscrub,nodeep-scrub flag(s) set
> >            1 pools have many more objects per pg than average
> >            Reduced data availability: 5 pgs inactive, 5 pgs incomplete
> >            Degraded data redundancy: 44798/8718528059 objects degraded
> > (0.001%), 1 pg degraded, 1 pg undersized
> >            22726 pgs not deep-scrubbed in time
> >            23552 pgs not scrubbed in time
> >            77 slow ops, oldest one blocked for 56400 sec, daemons
> > [osd.214,osd.223,osd.548,osd.584] have slow ops.
> >            too many PGs per OSD (330 > max 250)
> >
> >  services:
> >    mon: 3 daemons, quorum brc1mon2,brc1mon3,brc1mon1 (age 2y)
> >    mgr: brc1mon2(active, since 8d), standbys: brc1mon1, brc1mon3
> >    mds: cephfs:1 {0=brc1mds2=up:active} 1 up:standby
> >    osd: 1012 osds: 698 up (since 14h), 698 in (since 2d); 3 remapped pgs
> >         flags noscrub,nodeep-scrub
> >    rgw: 2 daemons active (brc1rgw1, brc1rgw2)
> >
> >  data:
> >    pools:   17 pools, 23552 pgs
> >    objects: 863.74M objects, 1.2 PiB
> >    usage:   2.4 PiB used, 6.2 PiB / 8.6 PiB avail
> >    pgs:     0.021% pgs not active
> >             44798/8718528059 objects degraded (0.001%)
> >             23546 active+clean
> >             3     remapped+incomplete
> >             2     incomplete
> >             1     active+undersized+degraded
> >
> >  io:
> >    client:   24 MiB/s rd, 3.2 KiB/s wr, 56 op/s rd, 4 op/s wr
> >
> > And the health detail shows as
> >
> > # ceph health detail
> > HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 1 pools have many more
> > objects per pg than average; Reduced data availability: 5 pgs inactive, 5
> > pgs incomplete; Degraded data redundancy: 44798/8718528081 objects
> degraded
> > (0.001%), 1 pg degraded, 1 pg undersized; 22726 pgs not deep-scrubbed in
> > time; 23552 pgs not scrubbed in time; 77 slow ops, oldest one blocked for
> > 56440 sec, daemons [osd.214,osd.223,osd.548,osd.584] have slow ops.; too
> > many PGs per OSD (330 > max 250)
> > OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
> > MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
> >    pool iscsi-images objects per pg (540004) is more than 14.7248 times
> > cluster average (36673)
> > PG_AVAILABILITY Reduced data availability: 5 pgs inactive, 5 pgs
> incomplete
> >    pg 15.3f3 is incomplete, acting
> > [584,281,672,699,199,224,239,430,355,504,196] (reducing pool
> > default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs
> for
> > 'incomplete')
> >    pg 15.985 is remapped+incomplete, acting
> >
> [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352]
> > (reducing pool default.rgw.buckets.data min_size from 9 may help; search
> > ceph.com/docs for 'incomplete')
> >    pg 15.d46 is remapped+incomplete, acting
> > [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678]
> > (reducing pool default.rgw.buckets.data min_size from 9 may help; search
> > ceph.com/docs for 'incomplete')
> >    pg 15.251e is incomplete, acting
> > [151,464,146,503,166,41,555,542,9,565,268] (reducing pool
> > default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs
> for
> > 'incomplete')
> >    pg 15.39d3 is remapped+incomplete, acting
> > [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647]
> > (reducing pool default.rgw.buckets.data min_size from 9 may help; search
> > ceph.com/docs for 'incomplete')
> > PG_DEGRADED Degraded data redundancy: 44798/8718528081 objects degraded
> > (0.001%), 1 pg degraded, 1 pg undersized
> >    pg 15.28f0 is stuck undersized for 67359238.592403, current state
> > active+undersized+degraded, last acting
> > [2147483647,343,355,415,426,640,302,392,78,202,607]
> > PG_NOT_DEEP_SCRUBBED 22726 pgs not deep-scrubbed in time
> >
> > We've the pools as below
> >
> > # ceph osd lspools
> > 1 iscsi-images
> > 2 cephfs_data
> > 3 cephfs_metadata
> > 4 .rgw.root
> > 5 default.rgw.control
> > 6 default.rgw.meta
> > 7 default.rgw.log
> > 8 default.rgw.buckets.index
> > 13 rbd
> > 15 default.rgw.buckets.data
> > 16 default.rgw.buckets.non-ec
> > 19 cephfs_data-ec
> > 22 rbd-ec
> > 23 iscsi-images-ec
> > 24 hpecpool
> > 25 hpec.rgw.buckets.index
> > 26 hpec.rgw.buckets.non-ec
> >
> >
> > We've been struggling for a long time to fix this but out of luck! Our
> RGW
> > daemons hosted on dedicated machines are continuously failing to respond,
> > being behind a load balancer, LB throws 504 Gateway Timeout as the
> daemons
> > are failing to respond in the expected time. We perform active health
> > checks from the LB on '/' by HTTP HEAD but these are failing as well,
> very
> > frequently. Currently we're surviving by writing a script that restarts
> RGW
> > daemons whenever the LB responds with HTTP status code 504. Any help is
> > highly appreciated!
> >
> > Regards,
> > Jayanth Reddy
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx