Thanks, I was able to get things back into a good state. I had to restart a few osd's and I also noticed that at a point...all of the pg's that were preventing full recovery involved osd.8. I removed that osd and things moved forward. I reviewed the raid controller logs for that osd and although the disk was still listed as healthy...I found some errors in the controller log that must have been causing some problems reading some amount of data. Thanks again. Shain On 7/23/21, 3:35 PM, "DHilsbos@xxxxxxxxxxxxxx" <DHilsbos@xxxxxxxxxxxxxx> wrote: Sean; These lines look bad: 14 scrub errors Reduced data availability: 2 pgs inactive Possible data damage: 8 pgs inconsistent osd.95 (root=default,host=hqosd8) is down I suspect you ran into a hardware issue with one more drives in some of the servers that did not go offline. osd.95 is offline, you need to resolve this. You should fix your tunables, when you can (probably not part of your current issues). Thank you, Dominic L. Hilsbos, MBA Vice President – Information Technology Perform Air International Inc. DHilsbos@xxxxxxxxxxxxxx https://urldefense.com/v3/__http://www.PerformAir.com__;!!Iwwt!FAQkxiDS80ZWksiJket210Oc_wLsRih_-WqhguEb44tq0_Ao7aqrgeIO_C8$ -----Original Message----- From: Shain Miley [mailto:SMiley@xxxxxxx] Sent: Friday, July 23, 2021 10:48 AM To: ceph-users@xxxxxxx Subject: Luminous won't fully recover We recently had a few Ceph nodes go offline which required a reboot. I have been able to get the cluster back to the state listed below however it does not seem like it will progress past the point of 23473/287823588 objects misplaced. Yesterday it was about 13% of the data that was misplaced…however this morning it has goteen to 0.008% but has not moved past this point in about an hour. Does anyone see anything in the output below that points to the problem and/or are there any suggestions that I can follow in order to figure out why the cluster health is not moving beyond this point? --------------------------------------------------- root@rbd1:~# ceph -s cluster: id: 504b5794-34bd-44e7-a8c3-0494cf800c23 health: HEALTH_ERR crush map has legacy tunables (require argonaut, min is firefly) 23473/287823588 objects misplaced (0.008%) 14 scrub errors Reduced data availability: 2 pgs inactive Possible data damage: 8 pgs inconsistent services: mon: 3 daemons, quorum hqceph1,hqceph2,hqceph3 mgr: hqceph2(active), standbys: hqceph3 osd: 288 osds: 270 up, 270 in; 2 remapped pgs rgw: 1 daemon active data: pools: 17 pools, 9411 pgs objects: 95.95M objects, 309TiB usage: 936TiB used, 627TiB / 1.53PiB avail pgs: 0.021% pgs not active 23473/287823588 objects misplaced (0.008%) 9369 active+clean 30 active+clean+scrubbing+deep 8 active+clean+inconsistent 2 activating+remapped 2 active+clean+scrubbing io: client: 1000B/s rd, 0B/s wr, 0op/s rd, 0op/s wr root@rbd1:~# ceph health detail HEALTH_ERR crush map has legacy tunables (require argonaut, min is firefly); 1 osds down; 23473/287823588 objects misplaced (0.008%); 14 scrub errors; Reduced data availability: 3 pgs inactive, 13 pgs peering; Possible data damage: 8 pgs inconsistent; Degraded data redundancy: 408658/287823588 objects degraded (0.142%), 38 pgs degraded OLD_CRUSH_TUNABLES crush map has legacy tunables (require argonaut, min is firefly) see https://urldefense.com/v3/__http://docs.ceph.com/docs/master/rados/operations/crush-map/*tunables__;Iw!!Iwwt!FAQkxiDS80ZWksiJket210Oc_wLsRih_-WqhguEb44tq0_Ao7aqrwpPnNRE$ OSD_DOWN 1 osds down osd.95 (root=default,host=hqosd8) is down OBJECT_MISPLACED 23473/287823588 objects misplaced (0.008%) OSD_SCRUB_ERRORS 14 scrub errors PG_AVAILABILITY Reduced data availability: 3 pgs inactive, 13 pgs peering pg 3.b41 is stuck peering for 106.682058, current state peering, last acting [204,190] pg 3.c33 is stuck peering for 103.403643, current state peering, last acting [228,274] pg 3.d15 is stuck peering for 128.537454, current state peering, last acting [286,24] pg 3.fa9 is stuck peering for 106.526146, current state peering, last acting [286,47] pg 3.fb7 is stuck peering for 105.878878, current state peering, last acting [62,97] pg 3.13a2 is stuck peering for 106.491138, current state peering, last acting [270,219] pg 3.1521 is stuck inactive for 170180.165265, current state activating+remapped, last acting [94,186,188] pg 3.1565 is stuck peering for 106.782784, current state peering, last acting [121,60] pg 3.157c is stuck peering for 128.557448, current state peering, last acting [128,268] pg 3.1744 is stuck peering for 106.639603, current state peering, last acting [192,142] pg 3.1ac8 is stuck peering for 127.839550, current state peering, last acting [221,190] pg 3.1e24 is stuck peering for 128.201670, current state peering, last acting [118,158] pg 3.1e46 is stuck inactive for 169121.764376, current state activating+remapped, last acting [87,199,170] pg 18.36 is stuck peering for 128.554121, current state peering, last acting [204] pg 21.1ce is stuck peering for 106.582584, current state peering, last acting [266,192] PG_DAMAGED Possible data damage: 8 pgs inconsistent pg 3.1ca is active+clean+inconsistent, acting [201,8,180] pg 3.56a is active+clean+inconsistent, acting [148,240,8] pg 3.b0f is active+clean+inconsistent, acting [148,260,8] pg 3.b56 is active+clean+inconsistent, acting [218,8,240] pg 3.10ff is active+clean+inconsistent, acting [262,8,211] pg 3.1192 is active+clean+inconsistent, acting [192,8,187] pg 3.124a is active+clean+inconsistent, acting [123,8,222] pg 3.1c55 is active+clean+inconsistent, acting [180,8,287] PG_DEGRADED Degraded data redundancy: 408658/287823588 objects degraded (0.142%), 38 pgs degraded pg 3.8f is active+undersized+degraded, acting [163,149] pg 3.ba is active+undersized+degraded, acting [68,280] pg 3.1aa is active+undersized+degraded, acting [176,211] pg 3.29e is active+undersized+degraded, acting [241,194] pg 3.323 is active+undersized+degraded, acting [78,194] pg 3.343 is active+undersized+degraded, acting [242,144] pg 3.4ae is active+undersized+degraded, acting [153,237] pg 3.524 is active+undersized+degraded, acting [252,222] pg 3.5c9 is active+undersized+degraded, acting [272,252] pg 3.713 is active+undersized+degraded, acting [273,80] pg 3.730 is active+undersized+degraded, acting [235,212] pg 3.88f is active+undersized+degraded, acting [222,285] pg 3.8cb is active+undersized+degraded, acting [285,20] pg 3.9a0 is active+undersized+degraded, acting [240,200] pg 3.c19 is active+undersized+degraded, acting [165,276] pg 3.ec8 is active+undersized+degraded, acting [158,40] pg 3.1025 is active+undersized+degraded, acting [258,274] pg 3.1058 is active+undersized+degraded, acting [38,68] pg 3.14e4 is active+undersized+degraded, acting [185,39] pg 3.150c is active+undersized+degraded, acting [138,140] pg 3.1545 is active+undersized+degraded, acting [222,55] pg 3.15a6 is active+undersized+degraded, acting [242,272] pg 3.1620 is active+undersized+degraded, acting [200,164] pg 3.1710 is active+undersized+degraded, acting [176,285] pg 3.1792 is active+undersized+degraded, acting [190,11] pg 3.17bd is active+undersized+degraded, acting [207,15] pg 3.17da is active+undersized+degraded, acting [5,160] pg 3.183e is active+undersized+degraded, acting [273,136] pg 3.197d is active+undersized+degraded, acting [241,139] pg 3.1a3d is active+undersized+degraded, acting [184,121] pg 3.1ba6 is active+undersized+degraded, acting [47,249] pg 3.1c2b is active+undersized+degraded, acting [268,80] pg 3.1ca2 is active+undersized+degraded, acting [280,152] pg 3.1cd4 is active+undersized+degraded, acting [2,129] pg 3.1e13 is active+undersized+degraded, acting [247,114] pg 12.56 is active+undersized+degraded, acting [54] pg 18.8 is undersized+degraded+peered, acting [260] pg 21.9f is active+undersized+degraded, acting [215,201] -------------------------------------------------------------------------------------------------- Thanks, Shain Shain Miley | Director of Platform and Infrastructure | Digital Media | smiley@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx