Good day I'm currently decommissioning a cluster that runs EC3+1 (rack failure domain - with 5 racks), however the cluster still has some production items on it since I'm in the process of moving it to our new EC8+2 cluster. Running Luminous 12.2.13 on Ubuntu 16 HWE, containerized with ceph-ansible 3.2. I currently get the following error after we lost 1 OSD (195). I'm forced to repair, scrub, deep scrub, restart OSDs etc, everything mentioned in the troubleshooting docs & information from IRC but cannot for the life of me get it to work. What I'm seeing is, that pg 9.3dd (volume_images) has a status of 1 OSD (osd is down) which I know, but the other OSD shows 316 (not queried). Also pg 9.3dd has 4x functioning UP's but still reference the missing OSD under acting. Regards OBJECT_UNFOUND 1/501815192 objects unfound (0.000%) pg 9.3dd has 1 unfound objects OSD_SCRUB_ERRORS 1 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 9.3d1 is active+clean+inconsistent, acting [347,316,307,249] PG_DEGRADED Degraded data redundancy: 1219/2001837265 objects degraded (0.000%), 1 pg degraded, 1 pg undersized pg 9.3dd is stuck undersized for 55486.439002, current state active+recovery_wait+forced_recovery+undersized+degraded+remapped, last acting [355,2147483647,64,367] ceph pg 9.3dd query "up": [ 355, 139, 64, 367 ], "acting": [ 355, 2147483647, 64, 367 "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2021-03-08 16:07:51.239010", "might_have_unfound": [ { "osd": "64(2)", "status": "already probed" }, { "osd": "139(1)", "status": "already probed" }, { "osd": "195(1)", "status": "osd is down" }, { "osd": "316(2)", "status": "not queried" }, { "osd": "367(3)", "status": "already probed" } ], "recovery_progress": { "backfill_targets": [ "139(1)" ceph pg 9.3d1 query { "state": "active+clean+inconsistent", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 168488, "up": [ 347, 316, 307, 249 ], "acting": [ 347, 316, 307, 249 ], "actingbackfill": [ "249(3)", "307(2)", "316(1)", "347(0)" -- CLUSTER STATS: cluster: id: 1ea59fbe-46a4-474e-8225-a66b32ca86b7 health: HEALTH_ERR 1/490166525 objects unfound (0.000%) 1 scrub errors Possible data damage: 1 pg inconsistent Degraded data redundancy: 1259/1956055233 objects degraded (0.000%), 1 pg degraded, 1 pg undersized services: mon: 3 daemons, quorum B-04-11-cephctl,B-05-11-cephctl,B-03-11-cephctl mgr: B-03-11-cephctl(active), standbys: B-04-11-cephctl, B-05-11-cephctl mds: cephfs-1/1/1 up {0=B-04-11-cephctl=up:active}, 2 up:standby osd: 384 osds: 383 up, 383 in; 1 remapped pgs data: pools: 11 pools, 13440 pgs objects: 490.17M objects, 1.35PiB usage: 1.88PiB used, 2.33PiB / 4.21PiB avail pgs: 1259/1956055233 objects degraded (0.000%) 1/490166525 objects unfound (0.000%) 13332 active+clean 96 active+clean+scrubbing+deep 10 active+clean+scrubbing 1 active+clean+inconsistent 1 active+recovery_wait+forced_recovery+undersized+degraded+remapped *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist Inter-University Institute for Data Intensive Astronomy 5th Floor, Department of Physics and Astronomy, University of Cape Town Tel: 021 959 4137 <0219592327> Web: www.idia.ac.za <http://www.uwc.ac.za/> E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx> Rondebosch, Cape Town, 7600, South Africa _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx