Re: EC 8+3 Pool PGs stuck in remapped+incomplete

Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> · Mon, 19 Jun 2023 14:20:26 +0530

Hello Weiwen,

Thank you for the response. I've attached the output for all PGs in state
incomplete and remapped+incomplete. Thank you!

Thanks,
Jayanth Reddy

On Sun, Jun 18, 2023 at 4:09 PM Jayanth Reddy <jayanthreddy5666@xxxxxxxxx>
wrote:

> Hello Weiwen,
>
> Thank you for the response. I've attached the output for all PGs in state
> incomplete and remapped+incomplete. Thank you!
>
> Thanks,
> Jayanth Reddy
>
> On Sat, Jun 17, 2023 at 11:00 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote:
>
>> Hi Jayanth,
>>
>> Can you post the complete output of “ceph pg <ID> query”? So that we can
>> understand the situation better.
>>
>> Can you get OSD 3 or 4 back into the cluster? If you are sure they cannot
>> rejoin, you may try “ceph osd lost <ID>” (doc says this may result in
>> permanent data lost. I didn’t have a chance to try this myself).
>>
>> Weiwen Hu
>>
>> > 在 2023年6月18日，00:26，Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> 写道：
>> >
>> > Hello Nino / Users,
>> >
>> > After some initial analysis, I had increased max_pg_per_osd to 480, but
>> > we're out of luck. Also tried force-backfill and force-repair as well.
>> > On querying PG using *# ceph pg **<pg.ID> query* the output says
>> blocked_by
>> > 3 to 4 OSDs which are out of the cluster already. Guessing if these
>> have to
>> > do something with the recovery.
>> >
>> > Thanks,
>> > Jayanth Reddy
>> >
>> >> On Sat, Jun 17, 2023 at 12:31 PM Jayanth Reddy <
>> jayanthreddy5666@xxxxxxxxx>
>> >> wrote:
>> >>
>> >> Thanks, Nino.
>> >>
>> >> Would give these initial suggestions a try and let you know at the
>> >> earliest.
>> >>
>> >> Regards,
>> >> Jayanth Reddy
>> >> ------------------------------
>> >> *From:* Nino Kotur <ninokotur@xxxxxxxxx>
>> >> *Sent:* Saturday, June 17, 2023 12:16:09 PM
>> >> *To:* Jayanth Reddy <jayanthreddy5666@xxxxxxxxx>
>> >> *Cc:* ceph-users@xxxxxxx <ceph-users@xxxxxxx>
>> >> *Subject:* Re:  EC 8+3 Pool PGs stuck in
>> remapped+incomplete
>> >>
>> >> problem is just that some of your OSDs have too much PGs and pool
>> cannot
>> >> recover as it cannot create more PGs
>> >>
>> >> [osd.214,osd.223,osd.548,osd.584] have slow ops.
>> >>            too many PGs per OSD (330 > max 250)
>> >>
>> >> I'd have to guess that the safest thing would be permanently or
>> >> temporarily adding more storage so that PGs drop below 250, another
>> option
>> >> is just dropping down the total number of PGs but I don't know if I
>> would
>> >> perform that action before my pool was healthy!
>> >>
>> >> in case that there is only one OSD that has this number of OSDs but all
>> >> other OSDs have less than 100-150 than you can just reweight
>> problematic
>> >> OSD so it rebalances those "too many PGs"
>> >>
>> >> But it looks to me that you have way too many PGs which is also super
>> >> negatively impacting performance.
>> >>
>> >> Another option is to increase max allowed PGs per OSD to say 350 this
>> >> should also allow cluster to rebuild honestly even tho this may be
>> easiest
>> >> option, i'd never do this, performance cost of having over 150 PGs per
>> OSD
>> >> suffer greatly.
>> >>
>> >>
>> >> kind regards,
>> >> Nino
>> >>
>> >>
>> >> On Sat, Jun 17, 2023 at 8:23 AM Jayanth Reddy <
>> jayanthreddy5666@xxxxxxxxx>
>> >> wrote:
>> >>
>> >> Hello Users,
>> >> Greetings. We've a Ceph Cluster with the version
>> >> *ceph version 14.2.5-382-g8881d33957
>> >> (8881d33957b54b101eae9c7627b351af10e87ee8) nautilus (stable)*
>> >>
>> >> 5 PGs belonging to our RGW 8+3 EC Pool are stuck in incomplete and
>> >> incomplete+remapped states. Below are the PGs,
>> >>
>> >> # ceph pg dump_stuck inactive
>> >> ok
>> >> PG_STAT STATE               UP
>> >> UP_PRIMARY ACTING
>> >>                 ACTING_PRIMARY
>> >> 15.251e          incomplete
>> [151,464,146,503,166,41,555,542,9,565,268]
>> >>     151
>> >> [151,464,146,503,166,41,555,542,9,565,268]            151
>> >> 15.3f3           incomplete
>> [584,281,672,699,199,224,239,430,355,504,196]
>> >>     584
>> >> [584,281,672,699,199,224,239,430,355,504,196]            584
>> >> 15.985  remapped+incomplete
>> [396,690,493,214,319,209,546,91,599,237,352]
>> >>     396
>> >>
>> >>
>> [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352]
>> >>           214
>> >> 15.39d3 remapped+incomplete
>> [404,221,223,585,38,102,533,471,568,451,195]
>> >>     404
>> >>
>> [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647]
>> >>         223
>> >> 15.d46  remapped+incomplete
>> [297,646,212,254,110,169,500,372,623,470,678]
>> >>     297
>> >>
>> [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678]
>> >>         548
>> >>
>> >> Some of the OSDs had gone down on the cluster. Below is the # ceph
>> status
>> >>
>> >> # ceph -s
>> >>  cluster:
>> >>    id:     30d6f7ee-fa02-4ab3-8a09-9321c8002794
>> >>    health: HEALTH_WARN
>> >>            noscrub,nodeep-scrub flag(s) set
>> >>            1 pools have many more objects per pg than average
>> >>            Reduced data availability: 5 pgs inactive, 5 pgs incomplete
>> >>            Degraded data redundancy: 44798/8718528059 objects degraded
>> >> (0.001%), 1 pg degraded, 1 pg undersized
>> >>            22726 pgs not deep-scrubbed in time
>> >>            23552 pgs not scrubbed in time
>> >>            77 slow ops, oldest one blocked for 56400 sec, daemons
>> >> [osd.214,osd.223,osd.548,osd.584] have slow ops.
>> >>            too many PGs per OSD (330 > max 250)
>> >>
>> >>  services:
>> >>    mon: 3 daemons, quorum brc1mon2,brc1mon3,brc1mon1 (age 2y)
>> >>    mgr: brc1mon2(active, since 8d), standbys: brc1mon1, brc1mon3
>> >>    mds: cephfs:1 {0=brc1mds2=up:active} 1 up:standby
>> >>    osd: 1012 osds: 698 up (since 14h), 698 in (since 2d); 3 remapped
>> pgs
>> >>         flags noscrub,nodeep-scrub
>> >>    rgw: 2 daemons active (brc1rgw1, brc1rgw2)
>> >>
>> >>  data:
>> >>    pools:   17 pools, 23552 pgs
>> >>    objects: 863.74M objects, 1.2 PiB
>> >>    usage:   2.4 PiB used, 6.2 PiB / 8.6 PiB avail
>> >>    pgs:     0.021% pgs not active
>> >>             44798/8718528059 objects degraded (0.001%)
>> >>             23546 active+clean
>> >>             3     remapped+incomplete
>> >>             2     incomplete
>> >>             1     active+undersized+degraded
>> >>
>> >>  io:
>> >>    client:   24 MiB/s rd, 3.2 KiB/s wr, 56 op/s rd, 4 op/s wr
>> >>
>> >> And the health detail shows as
>> >>
>> >> # ceph health detail
>> >> HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 1 pools have many more
>> >> objects per pg than average; Reduced data availability: 5 pgs
>> inactive, 5
>> >> pgs incomplete; Degraded data redundancy: 44798/8718528081 objects
>> degraded
>> >> (0.001%), 1 pg degraded, 1 pg undersized; 22726 pgs not deep-scrubbed
>> in
>> >> time; 23552 pgs not scrubbed in time; 77 slow ops, oldest one blocked
>> for
>> >> 56440 sec, daemons [osd.214,osd.223,osd.548,osd.584] have slow ops.;
>> too
>> >> many PGs per OSD (330 > max 250)
>> >> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
>> >> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>> >>    pool iscsi-images objects per pg (540004) is more than 14.7248 times
>> >> cluster average (36673)
>> >> PG_AVAILABILITY Reduced data availability: 5 pgs inactive, 5 pgs
>> incomplete
>> >>    pg 15.3f3 is incomplete, acting
>> >> [584,281,672,699,199,224,239,430,355,504,196] (reducing pool
>> >> default.rgw.buckets.data min_size from 9 may help; search
>> ceph.com/docs
>> >> for
>> >> 'incomplete')
>> >>    pg 15.985 is remapped+incomplete, acting
>> >>
>> >>
>> [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352]
>> >> (reducing pool default.rgw.buckets.data min_size from 9 may help;
>> search
>> >> ceph.com/docs for 'incomplete')
>> >>    pg 15.d46 is remapped+incomplete, acting
>> >>
>> [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678]
>> >> (reducing pool default.rgw.buckets.data min_size from 9 may help;
>> search
>> >> ceph.com/docs for 'incomplete')
>> >>    pg 15.251e is incomplete, acting
>> >> [151,464,146,503,166,41,555,542,9,565,268] (reducing pool
>> >> default.rgw.buckets.data min_size from 9 may help; search
>> ceph.com/docs
>> >> for
>> >> 'incomplete')
>> >>    pg 15.39d3 is remapped+incomplete, acting
>> >>
>> [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647]
>> >> (reducing pool default.rgw.buckets.data min_size from 9 may help;
>> search
>> >> ceph.com/docs for 'incomplete')
>> >> PG_DEGRADED Degraded data redundancy: 44798/8718528081 objects degraded
>> >> (0.001%), 1 pg degraded, 1 pg undersized
>> >>    pg 15.28f0 is stuck undersized for 67359238.592403, current state
>> >> active+undersized+degraded, last acting
>> >> [2147483647,343,355,415,426,640,302,392,78,202,607]
>> >> PG_NOT_DEEP_SCRUBBED 22726 pgs not deep-scrubbed in time
>> >>
>> >> We've the pools as below
>> >>
>> >> # ceph osd lspools
>> >> 1 iscsi-images
>> >> 2 cephfs_data
>> >> 3 cephfs_metadata
>> >> 4 .rgw.root
>> >> 5 default.rgw.control
>> >> 6 default.rgw.meta
>> >> 7 default.rgw.log
>> >> 8 default.rgw.buckets.index
>> >> 13 rbd
>> >> 15 default.rgw.buckets.data
>> >> 16 default.rgw.buckets.non-ec
>> >> 19 cephfs_data-ec
>> >> 22 rbd-ec
>> >> 23 iscsi-images-ec
>> >> 24 hpecpool
>> >> 25 hpec.rgw.buckets.index
>> >> 26 hpec.rgw.buckets.non-ec
>> >>
>> >>
>> >> We've been struggling for a long time to fix this but out of luck! Our
>> RGW
>> >> daemons hosted on dedicated machines are continuously failing to
>> respond,
>> >> being behind a load balancer, LB throws 504 Gateway Timeout as the
>> daemons
>> >> are failing to respond in the expected time. We perform active health
>> >> checks from the LB on '/' by HTTP HEAD but these are failing as well,
>> very
>> >> frequently. Currently we're surviving by writing a script that
>> restarts RGW
>> >> daemons whenever the LB responds with HTTP status code 504. Any help is
>> >> highly appreciated!
>> >>
>> >> Regards,
>> >> Jayanth Reddy
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>
>> >>
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx