Hello Weiwen, Thank you for the response. I've attached the output for all PGs in state incomplete and remapped+incomplete. Thank you! Thanks, Jayanth Reddy On Sun, Jun 18, 2023 at 4:09 PM Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> wrote: > Hello Weiwen, > > Thank you for the response. I've attached the output for all PGs in state > incomplete and remapped+incomplete. Thank you! > > Thanks, > Jayanth Reddy > > On Sat, Jun 17, 2023 at 11:00 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote: > >> Hi Jayanth, >> >> Can you post the complete output of “ceph pg <ID> query”? So that we can >> understand the situation better. >> >> Can you get OSD 3 or 4 back into the cluster? If you are sure they cannot >> rejoin, you may try “ceph osd lost <ID>” (doc says this may result in >> permanent data lost. I didn’t have a chance to try this myself). >> >> Weiwen Hu >> >> > 在 2023年6月18日,00:26,Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> 写道: >> > >> > Hello Nino / Users, >> > >> > After some initial analysis, I had increased max_pg_per_osd to 480, but >> > we're out of luck. Also tried force-backfill and force-repair as well. >> > On querying PG using *# ceph pg **<pg.ID> query* the output says >> blocked_by >> > 3 to 4 OSDs which are out of the cluster already. Guessing if these >> have to >> > do something with the recovery. >> > >> > Thanks, >> > Jayanth Reddy >> > >> >> On Sat, Jun 17, 2023 at 12:31 PM Jayanth Reddy < >> jayanthreddy5666@xxxxxxxxx> >> >> wrote: >> >> >> >> Thanks, Nino. >> >> >> >> Would give these initial suggestions a try and let you know at the >> >> earliest. >> >> >> >> Regards, >> >> Jayanth Reddy >> >> ------------------------------ >> >> *From:* Nino Kotur <ninokotur@xxxxxxxxx> >> >> *Sent:* Saturday, June 17, 2023 12:16:09 PM >> >> *To:* Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> >> >> *Cc:* ceph-users@xxxxxxx <ceph-users@xxxxxxx> >> >> *Subject:* Re: EC 8+3 Pool PGs stuck in >> remapped+incomplete >> >> >> >> problem is just that some of your OSDs have too much PGs and pool >> cannot >> >> recover as it cannot create more PGs >> >> >> >> [osd.214,osd.223,osd.548,osd.584] have slow ops. >> >> too many PGs per OSD (330 > max 250) >> >> >> >> I'd have to guess that the safest thing would be permanently or >> >> temporarily adding more storage so that PGs drop below 250, another >> option >> >> is just dropping down the total number of PGs but I don't know if I >> would >> >> perform that action before my pool was healthy! >> >> >> >> in case that there is only one OSD that has this number of OSDs but all >> >> other OSDs have less than 100-150 than you can just reweight >> problematic >> >> OSD so it rebalances those "too many PGs" >> >> >> >> But it looks to me that you have way too many PGs which is also super >> >> negatively impacting performance. >> >> >> >> Another option is to increase max allowed PGs per OSD to say 350 this >> >> should also allow cluster to rebuild honestly even tho this may be >> easiest >> >> option, i'd never do this, performance cost of having over 150 PGs per >> OSD >> >> suffer greatly. >> >> >> >> >> >> kind regards, >> >> Nino >> >> >> >> >> >> On Sat, Jun 17, 2023 at 8:23 AM Jayanth Reddy < >> jayanthreddy5666@xxxxxxxxx> >> >> wrote: >> >> >> >> Hello Users, >> >> Greetings. We've a Ceph Cluster with the version >> >> *ceph version 14.2.5-382-g8881d33957 >> >> (8881d33957b54b101eae9c7627b351af10e87ee8) nautilus (stable)* >> >> >> >> 5 PGs belonging to our RGW 8+3 EC Pool are stuck in incomplete and >> >> incomplete+remapped states. Below are the PGs, >> >> >> >> # ceph pg dump_stuck inactive >> >> ok >> >> PG_STAT STATE UP >> >> UP_PRIMARY ACTING >> >> ACTING_PRIMARY >> >> 15.251e incomplete >> [151,464,146,503,166,41,555,542,9,565,268] >> >> 151 >> >> [151,464,146,503,166,41,555,542,9,565,268] 151 >> >> 15.3f3 incomplete >> [584,281,672,699,199,224,239,430,355,504,196] >> >> 584 >> >> [584,281,672,699,199,224,239,430,355,504,196] 584 >> >> 15.985 remapped+incomplete >> [396,690,493,214,319,209,546,91,599,237,352] >> >> 396 >> >> >> >> >> [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352] >> >> 214 >> >> 15.39d3 remapped+incomplete >> [404,221,223,585,38,102,533,471,568,451,195] >> >> 404 >> >> >> [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647] >> >> 223 >> >> 15.d46 remapped+incomplete >> [297,646,212,254,110,169,500,372,623,470,678] >> >> 297 >> >> >> [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678] >> >> 548 >> >> >> >> Some of the OSDs had gone down on the cluster. Below is the # ceph >> status >> >> >> >> # ceph -s >> >> cluster: >> >> id: 30d6f7ee-fa02-4ab3-8a09-9321c8002794 >> >> health: HEALTH_WARN >> >> noscrub,nodeep-scrub flag(s) set >> >> 1 pools have many more objects per pg than average >> >> Reduced data availability: 5 pgs inactive, 5 pgs incomplete >> >> Degraded data redundancy: 44798/8718528059 objects degraded >> >> (0.001%), 1 pg degraded, 1 pg undersized >> >> 22726 pgs not deep-scrubbed in time >> >> 23552 pgs not scrubbed in time >> >> 77 slow ops, oldest one blocked for 56400 sec, daemons >> >> [osd.214,osd.223,osd.548,osd.584] have slow ops. >> >> too many PGs per OSD (330 > max 250) >> >> >> >> services: >> >> mon: 3 daemons, quorum brc1mon2,brc1mon3,brc1mon1 (age 2y) >> >> mgr: brc1mon2(active, since 8d), standbys: brc1mon1, brc1mon3 >> >> mds: cephfs:1 {0=brc1mds2=up:active} 1 up:standby >> >> osd: 1012 osds: 698 up (since 14h), 698 in (since 2d); 3 remapped >> pgs >> >> flags noscrub,nodeep-scrub >> >> rgw: 2 daemons active (brc1rgw1, brc1rgw2) >> >> >> >> data: >> >> pools: 17 pools, 23552 pgs >> >> objects: 863.74M objects, 1.2 PiB >> >> usage: 2.4 PiB used, 6.2 PiB / 8.6 PiB avail >> >> pgs: 0.021% pgs not active >> >> 44798/8718528059 objects degraded (0.001%) >> >> 23546 active+clean >> >> 3 remapped+incomplete >> >> 2 incomplete >> >> 1 active+undersized+degraded >> >> >> >> io: >> >> client: 24 MiB/s rd, 3.2 KiB/s wr, 56 op/s rd, 4 op/s wr >> >> >> >> And the health detail shows as >> >> >> >> # ceph health detail >> >> HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 1 pools have many more >> >> objects per pg than average; Reduced data availability: 5 pgs >> inactive, 5 >> >> pgs incomplete; Degraded data redundancy: 44798/8718528081 objects >> degraded >> >> (0.001%), 1 pg degraded, 1 pg undersized; 22726 pgs not deep-scrubbed >> in >> >> time; 23552 pgs not scrubbed in time; 77 slow ops, oldest one blocked >> for >> >> 56440 sec, daemons [osd.214,osd.223,osd.548,osd.584] have slow ops.; >> too >> >> many PGs per OSD (330 > max 250) >> >> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set >> >> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average >> >> pool iscsi-images objects per pg (540004) is more than 14.7248 times >> >> cluster average (36673) >> >> PG_AVAILABILITY Reduced data availability: 5 pgs inactive, 5 pgs >> incomplete >> >> pg 15.3f3 is incomplete, acting >> >> [584,281,672,699,199,224,239,430,355,504,196] (reducing pool >> >> default.rgw.buckets.data min_size from 9 may help; search >> ceph.com/docs >> >> for >> >> 'incomplete') >> >> pg 15.985 is remapped+incomplete, acting >> >> >> >> >> [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352] >> >> (reducing pool default.rgw.buckets.data min_size from 9 may help; >> search >> >> ceph.com/docs for 'incomplete') >> >> pg 15.d46 is remapped+incomplete, acting >> >> >> [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678] >> >> (reducing pool default.rgw.buckets.data min_size from 9 may help; >> search >> >> ceph.com/docs for 'incomplete') >> >> pg 15.251e is incomplete, acting >> >> [151,464,146,503,166,41,555,542,9,565,268] (reducing pool >> >> default.rgw.buckets.data min_size from 9 may help; search >> ceph.com/docs >> >> for >> >> 'incomplete') >> >> pg 15.39d3 is remapped+incomplete, acting >> >> >> [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647] >> >> (reducing pool default.rgw.buckets.data min_size from 9 may help; >> search >> >> ceph.com/docs for 'incomplete') >> >> PG_DEGRADED Degraded data redundancy: 44798/8718528081 objects degraded >> >> (0.001%), 1 pg degraded, 1 pg undersized >> >> pg 15.28f0 is stuck undersized for 67359238.592403, current state >> >> active+undersized+degraded, last acting >> >> [2147483647,343,355,415,426,640,302,392,78,202,607] >> >> PG_NOT_DEEP_SCRUBBED 22726 pgs not deep-scrubbed in time >> >> >> >> We've the pools as below >> >> >> >> # ceph osd lspools >> >> 1 iscsi-images >> >> 2 cephfs_data >> >> 3 cephfs_metadata >> >> 4 .rgw.root >> >> 5 default.rgw.control >> >> 6 default.rgw.meta >> >> 7 default.rgw.log >> >> 8 default.rgw.buckets.index >> >> 13 rbd >> >> 15 default.rgw.buckets.data >> >> 16 default.rgw.buckets.non-ec >> >> 19 cephfs_data-ec >> >> 22 rbd-ec >> >> 23 iscsi-images-ec >> >> 24 hpecpool >> >> 25 hpec.rgw.buckets.index >> >> 26 hpec.rgw.buckets.non-ec >> >> >> >> >> >> We've been struggling for a long time to fix this but out of luck! Our >> RGW >> >> daemons hosted on dedicated machines are continuously failing to >> respond, >> >> being behind a load balancer, LB throws 504 Gateway Timeout as the >> daemons >> >> are failing to respond in the expected time. We perform active health >> >> checks from the LB on '/' by HTTP HEAD but these are failing as well, >> very >> >> frequently. Currently we're surviving by writing a script that >> restarts RGW >> >> daemons whenever the LB responds with HTTP status code 504. Any help is >> >> highly appreciated! >> >> >> >> Regards, >> >> Jayanth Reddy >> >> _______________________________________________ >> >> ceph-users mailing list -- ceph-users@xxxxxxx >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> >> >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx