Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs? It would be useful to double confirm that: check with `ceph pg <id> query` and `ceph pg dump`. (If so, this is why the ignore history les thing isn't helping; you don't have the minimum 3 stripes up for those 3+1 PGs.) If those "lost" OSDs by some miracle still have the PG data, you might be able to export the relevant PG stripes with the ceph-objectstore-tool. I've never tried this myself, but there have been threads in the past where people export a PG from a nearly dead hdd, import to another OSD, then backfilling works. If OTOH those PGs are really lost forever, and someone else should confirm what I say here, I think the next step would be to force recreate the incomplete PGs then run a set of cephfs scrub/repair disaster recovery cmds to recover what you can from the cephfs. -- dan On Mon, May 13, 2019 at 4:20 PM Kevin Flöh <kevin.floeh@xxxxxxx> wrote: > > Dear ceph experts, > > we have several (maybe related) problems with our ceph cluster, let me > first show you the current ceph status: > > cluster: > id: 23e72372-0d44-4cad-b24f-3641b14b86f4 > health: HEALTH_ERR > 1 MDSs report slow metadata IOs > 1 MDSs report slow requests > 1 MDSs behind on trimming > 1/126319678 objects unfound (0.000%) > 19 scrub errors > Reduced data availability: 2 pgs inactive, 2 pgs incomplete > Possible data damage: 7 pgs inconsistent > Degraded data redundancy: 1/500333881 objects degraded > (0.000%), 1 pg degraded > 118 stuck requests are blocked > 4096 sec. Implicated osds > 24,32,91 > > services: > mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02 > mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu > mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3 > up:standby > osd: 96 osds: 96 up, 96 in > > data: > pools: 2 pools, 4096 pgs > objects: 126.32M objects, 260TiB > usage: 372TiB used, 152TiB / 524TiB avail > pgs: 0.049% pgs not active > 1/500333881 objects degraded (0.000%) > 1/126319678 objects unfound (0.000%) > 4076 active+clean > 10 active+clean+scrubbing+deep > 7 active+clean+inconsistent > 2 incomplete > 1 active+recovery_wait+degraded > > io: > client: 449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr > > > and ceph health detail: > > > HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; > 1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19 > scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs > incomplete; Possible data damage: 7 pgs inconsistent; Degraded data > redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118 > stuck requests are blocked > 4096 sec. Implicated osds 24,32,91 > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are > blocked > 30 secs, oldest blocked for 351193 secs > MDS_SLOW_REQUEST 1 MDSs report slow requests > mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are blocked > 30 sec > MDS_TRIM 1 MDSs behind on trimming > mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming (46034/128) > max_segments: 128, num_segments: 46034 > OBJECT_UNFOUND 1/126319687 objects unfound (0.000%) > pg 1.24c has 1 unfound objects > OSD_SCRUB_ERRORS 19 scrub errors > PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs incomplete > pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31 > min_size from 3 may help; search ceph.com/docs for 'incomplete') > pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31 > min_size from 3 may help; search ceph.com/docs for 'incomplete') > PG_DAMAGED Possible data damage: 7 pgs inconsistent > pg 1.17f is active+clean+inconsistent, acting [65,49,25,4] > pg 1.1e0 is active+clean+inconsistent, acting [11,32,4,81] > pg 1.203 is active+clean+inconsistent, acting [43,49,4,72] > pg 1.5d3 is active+clean+inconsistent, acting [37,27,85,4] > pg 1.779 is active+clean+inconsistent, acting [50,4,77,62] > pg 1.77c is active+clean+inconsistent, acting [21,49,40,4] > pg 1.7c3 is active+clean+inconsistent, acting [1,14,68,4] > PG_DEGRADED Degraded data redundancy: 1/500333908 objects degraded > (0.000%), 1 pg degraded > pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1 > unfound > REQUEST_STUCK 118 stuck requests are blocked > 4096 sec. Implicated osds > 24,32,91 > 118 ops are blocked > 536871 sec > osds 24,32,91 have stuck requests > 536871 sec > > > Let me briefly summarize the setup: We have 4 nodes with 24 osds each > and use 3+1 erasure coding. The nodes run on centos7 and we use, due to > a major mistake when setting up the cluster, more than one ceph version > on the nodes, 3 nodes run on 12.2.12 and one runs on 13.2.5. We are > currently not daring to update all nodes to 13.2.5. For all the version > details see: > > { > "mon": { > "ceph version 12.2.12 > (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 3 > }, > "mgr": { > "ceph version 12.2.12 > (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 2 > }, > "osd": { > "ceph version 12.2.12 > (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 72, > "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) > mimic (stable)": 24 > }, > "mds": { > "ceph version 12.2.12 > (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 4 > }, > "overall": { > "ceph version 12.2.12 > (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 81, > "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) > mimic (stable)": 24 > } > } > > Here is what happened: One osd daemon could not be started and therefore > we decided to mark the osd as lost and set it up from scratch. Ceph > started recovering and then we lost another osd with the same behavior. > We did the same as for the first osd. And now we are stuck with 2 pgs in > incomplete. Ceph pg query gives the following problem: > > "down_osds_we_would_probe": [], > "peering_blocked_by": [], > "peering_blocked_by_detail": [ > { > "detail": "peering_blocked_by_history_les_bound" > } > > We already tried to set "osd_find_best_info_ignore_history_les": "true" > for the affected osds, which had no effect. Furthermore, the cluster is > behind on trimming by more than 40,000 segments and we have folders and > files which cannot be deleted or moved. (which are not on the 2 > incomplete pgs). Is there any way to solve these problems? > > Best regards, > > Kevin > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com