Thanks, Nino. Would give these initial suggestions a try and let you know at the earliest. Regards, Jayanth Reddy ________________________________ From: Nino Kotur <ninokotur@xxxxxxxxx> Sent: Saturday, June 17, 2023 12:16:09 PM To: Jayanth Reddy <jayanthreddy5666@xxxxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: EC 8+3 Pool PGs stuck in remapped+incomplete problem is just that some of your OSDs have too much PGs and pool cannot recover as it cannot create more PGs [osd.214,osd.223,osd.548,osd.584] have slow ops. too many PGs per OSD (330 > max 250) I'd have to guess that the safest thing would be permanently or temporarily adding more storage so that PGs drop below 250, another option is just dropping down the total number of PGs but I don't know if I would perform that action before my pool was healthy! in case that there is only one OSD that has this number of OSDs but all other OSDs have less than 100-150 than you can just reweight problematic OSD so it rebalances those "too many PGs" But it looks to me that you have way too many PGs which is also super negatively impacting performance. Another option is to increase max allowed PGs per OSD to say 350 this should also allow cluster to rebuild honestly even tho this may be easiest option, i'd never do this, performance cost of having over 150 PGs per OSD suffer greatly. kind regards, Nino On Sat, Jun 17, 2023 at 8:23 AM Jayanth Reddy <jayanthreddy5666@xxxxxxxxx<mailto:jayanthreddy5666@xxxxxxxxx>> wrote: Hello Users, Greetings. We've a Ceph Cluster with the version *ceph version 14.2.5-382-g8881d33957 (8881d33957b54b101eae9c7627b351af10e87ee8) nautilus (stable)* 5 PGs belonging to our RGW 8+3 EC Pool are stuck in incomplete and incomplete+remapped states. Below are the PGs, # ceph pg dump_stuck inactive ok PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 15.251e incomplete [151,464,146,503,166,41,555,542,9,565,268] 151 [151,464,146,503,166,41,555,542,9,565,268] 151 15.3f3 incomplete [584,281,672,699,199,224,239,430,355,504,196] 584 [584,281,672,699,199,224,239,430,355,504,196] 584 15.985 remapped+incomplete [396,690,493,214,319,209,546,91,599,237,352] 396 [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352] 214 15.39d3 remapped+incomplete [404,221,223,585,38,102,533,471,568,451,195] 404 [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647] 223 15.d46 remapped+incomplete [297,646,212,254,110,169,500,372,623,470,678] 297 [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678] 548 Some of the OSDs had gone down on the cluster. Below is the # ceph status # ceph -s cluster: id: 30d6f7ee-fa02-4ab3-8a09-9321c8002794 health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 1 pools have many more objects per pg than average Reduced data availability: 5 pgs inactive, 5 pgs incomplete Degraded data redundancy: 44798/8718528059 objects degraded (0.001%), 1 pg degraded, 1 pg undersized 22726 pgs not deep-scrubbed in time 23552 pgs not scrubbed in time 77 slow ops, oldest one blocked for 56400 sec, daemons [osd.214,osd.223,osd.548,osd.584] have slow ops. too many PGs per OSD (330 > max 250) services: mon: 3 daemons, quorum brc1mon2,brc1mon3,brc1mon1 (age 2y) mgr: brc1mon2(active, since 8d), standbys: brc1mon1, brc1mon3 mds: cephfs:1 {0=brc1mds2=up:active} 1 up:standby osd: 1012 osds: 698 up (since 14h), 698 in (since 2d); 3 remapped pgs flags noscrub,nodeep-scrub rgw: 2 daemons active (brc1rgw1, brc1rgw2) data: pools: 17 pools, 23552 pgs objects: 863.74M objects, 1.2 PiB usage: 2.4 PiB used, 6.2 PiB / 8.6 PiB avail pgs: 0.021% pgs not active 44798/8718528059 objects degraded (0.001%) 23546 active+clean 3 remapped+incomplete 2 incomplete 1 active+undersized+degraded io: client: 24 MiB/s rd, 3.2 KiB/s wr, 56 op/s rd, 4 op/s wr And the health detail shows as # ceph health detail HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 1 pools have many more objects per pg than average; Reduced data availability: 5 pgs inactive, 5 pgs incomplete; Degraded data redundancy: 44798/8718528081 objects degraded (0.001%), 1 pg degraded, 1 pg undersized; 22726 pgs not deep-scrubbed in time; 23552 pgs not scrubbed in time; 77 slow ops, oldest one blocked for 56440 sec, daemons [osd.214,osd.223,osd.548,osd.584] have slow ops.; too many PGs per OSD (330 > max 250) OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool iscsi-images objects per pg (540004) is more than 14.7248 times cluster average (36673) PG_AVAILABILITY Reduced data availability: 5 pgs inactive, 5 pgs incomplete pg 15.3f3 is incomplete, acting [584,281,672,699,199,224,239,430,355,504,196] (reducing pool default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs<http://ceph.com/docs> for 'incomplete') pg 15.985 is remapped+incomplete, acting [2147483647,2147483647,2147483647,214,319,2147483647,546,91,599,2147483647,352] (reducing pool default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs<http://ceph.com/docs> for 'incomplete') pg 15.d46 is remapped+incomplete, acting [2147483647,548,2147483647,2147483647,110,169,500,372,2147483647,470,678] (reducing pool default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs<http://ceph.com/docs> for 'incomplete') pg 15.251e is incomplete, acting [151,464,146,503,166,41,555,542,9,565,268] (reducing pool default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs<http://ceph.com/docs> for 'incomplete') pg 15.39d3 is remapped+incomplete, acting [2147483647,2147483647,223,585,38,102,533,2147483647,231,451,2147483647] (reducing pool default.rgw.buckets.data min_size from 9 may help; search ceph.com/docs<http://ceph.com/docs> for 'incomplete') PG_DEGRADED Degraded data redundancy: 44798/8718528081 objects degraded (0.001%), 1 pg degraded, 1 pg undersized pg 15.28f0 is stuck undersized for 67359238.592403, current state active+undersized+degraded, last acting [2147483647,343,355,415,426,640,302,392,78,202,607] PG_NOT_DEEP_SCRUBBED 22726 pgs not deep-scrubbed in time We've the pools as below # ceph osd lspools 1 iscsi-images 2 cephfs_data 3 cephfs_metadata 4 .rgw.root 5 default.rgw.control 6 default.rgw.meta 7 default.rgw.log 8 default.rgw.buckets.index 13 rbd 15 default.rgw.buckets.data 16 default.rgw.buckets.non-ec 19 cephfs_data-ec 22 rbd-ec 23 iscsi-images-ec 24 hpecpool 25 hpec.rgw.buckets.index 26 hpec.rgw.buckets.non-ec We've been struggling for a long time to fix this but out of luck! Our RGW daemons hosted on dedicated machines are continuously failing to respond, being behind a load balancer, LB throws 504 Gateway Timeout as the daemons are failing to respond in the expected time. We perform active health checks from the LB on '/' by HTTP HEAD but these are failing as well, very frequently. Currently we're surviving by writing a script that restarts RGW daemons whenever the LB responds with HTTP status code 504. Any help is highly appreciated! Regards, Jayanth Reddy _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx