2 Pgs (1x inconsistent, 1x unfound / degraded - unable to fix

Jeremi Avenant <jeremi@xxxxxxxxxx> · Tue, 9 Mar 2021 16:38:43 +0200

Good day

I'm currently decommissioning a cluster that runs EC3+1 (rack failure
domain - with 5 racks), however the cluster still has some production items
on it since I'm in the process of moving it to our new EC8+2 cluster.

Running Luminous 12.2.13 on Ubuntu 16 HWE, containerized with ceph-ansible
3.2.

I currently get the following error after we lost 1 OSD (195).

I'm forced to repair, scrub, deep scrub, restart OSDs etc, everything
mentioned in the troubleshooting docs & information from IRC but cannot for
the life of me get it to work.

What I'm seeing is, that pg 9.3dd (volume_images) has a status of 1 OSD
(osd is down) which I know, but the other OSD shows 316 (not queried). Also
pg 9.3dd has 4x functioning UP's but still reference the missing OSD under
acting.

Regards

OBJECT_UNFOUND 1/501815192 objects unfound (0.000%)
    pg 9.3dd has 1 unfound objects
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 9.3d1 is active+clean+inconsistent, acting [347,316,307,249]
PG_DEGRADED Degraded data redundancy: 1219/2001837265 objects degraded
(0.000%), 1 pg degraded, 1 pg undersized
    pg 9.3dd is stuck undersized for 55486.439002, current state
active+recovery_wait+forced_recovery+undersized+degraded+remapped, last
acting [355,2147483647,64,367]

ceph pg 9.3dd query
                "up": [
                    355,
                    139,
                    64,
                    367
                ],
                "acting": [
                    355,
                    2147483647,
                    64,
                    367

    "recovery_state": [
        {
            "name": "Started/Primary/Active",
            "enter_time": "2021-03-08 16:07:51.239010",
            "might_have_unfound": [
                {
                    "osd": "64(2)",
                    "status": "already probed"
                },
                {
                    "osd": "139(1)",
                    "status": "already probed"
                },
                {
                    "osd": "195(1)",
                    "status": "osd is down"
                },
                {
                    "osd": "316(2)",
                    "status": "not queried"
                },
                {
                    "osd": "367(3)",
                    "status": "already probed"
                }
            ],
            "recovery_progress": {
                "backfill_targets": [
                    "139(1)"

ceph pg 9.3d1 query
{
    "state": "active+clean+inconsistent",
    "snap_trimq": "[]",
    "snap_trimq_len": 0,
    "epoch": 168488,
    "up": [
        347,
        316,
        307,
        249
    ],
    "acting": [
        347,
        316,
        307,
        249
    ],
    "actingbackfill": [
        "249(3)",
        "307(2)",
        "316(1)",
        "347(0)"

-- 
CLUSTER STATS:

  cluster:
    id:     1ea59fbe-46a4-474e-8225-a66b32ca86b7
    health: HEALTH_ERR
            1/490166525 objects unfound (0.000%)
            1 scrub errors
            Possible data damage: 1 pg inconsistent
            Degraded data redundancy: 1259/1956055233 objects degraded
(0.000%), 1 pg degraded, 1 pg undersized

  services:
    mon: 3 daemons, quorum B-04-11-cephctl,B-05-11-cephctl,B-03-11-cephctl
    mgr: B-03-11-cephctl(active), standbys: B-04-11-cephctl, B-05-11-cephctl
    mds: cephfs-1/1/1 up  {0=B-04-11-cephctl=up:active}, 2 up:standby
    osd: 384 osds: 383 up, 383 in; 1 remapped pgs

  data:
    pools:   11 pools, 13440 pgs
    objects: 490.17M objects, 1.35PiB
    usage:   1.88PiB used, 2.33PiB / 4.21PiB avail
    pgs:     1259/1956055233 objects degraded (0.000%)
             1/490166525 objects unfound (0.000%)
             13332 active+clean
             96    active+clean+scrubbing+deep
             10    active+clean+scrubbing
             1     active+clean+inconsistent
             1
active+recovery_wait+forced_recovery+undersized+degraded+remapped

*Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
Inter-University Institute for Data Intensive Astronomy
5th Floor, Department of Physics and Astronomy,
University of Cape Town

Tel: 021 959 4137 <0219592327>
Web: www.idia.ac.za <http://www.uwc.ac.za/>
E-mail (IDIA): jeremi@xxxxxxxxxx <mfundo@xxxxxxxxxx>
Rondebosch, Cape Town, 7600, South Africa
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx