On 5/21/19 4:48 PM, Kevin Flöh wrote: > Hi, > > we gave up on the incomplete pgs since we do not have enough complete > shards to restore them. What is the procedure to get rid of these pgs? > You need to start with marking the OSDs as 'lost' and then you can force_create_pg to get the PGs back (empty). Wido > regards, > > Kevin > > On 20.05.19 9:22 vorm., Kevin Flöh wrote: >> Hi Frederic, >> >> we do not have access to the original OSDs. We exported the remaining >> shards of the two pgs but we are only left with two shards (of >> reasonable size) per pg. The rest of the shards displayed by ceph pg >> query are empty. I guess marking the OSD as complete doesn't make >> sense then. >> >> Best, >> Kevin >> >> On 17.05.19 2:36 nachm., Frédéric Nass wrote: >>> >>> >>> Le 14/05/2019 à 10:04, Kevin Flöh a écrit : >>>> >>>> On 13.05.19 11:21 nachm., Dan van der Ster wrote: >>>>> Presumably the 2 OSDs you marked as lost were hosting those >>>>> incomplete PGs? >>>>> It would be useful to double confirm that: check with `ceph pg <id> >>>>> query` and `ceph pg dump`. >>>>> (If so, this is why the ignore history les thing isn't helping; you >>>>> don't have the minimum 3 stripes up for those 3+1 PGs.) >>>> >>>> yes, but as written in my other mail, we still have enough shards, >>>> at least I think so. >>>> >>>>> >>>>> If those "lost" OSDs by some miracle still have the PG data, you might >>>>> be able to export the relevant PG stripes with the >>>>> ceph-objectstore-tool. I've never tried this myself, but there have >>>>> been threads in the past where people export a PG from a nearly dead >>>>> hdd, import to another OSD, then backfilling works. >>>> guess that is not possible. >>> >>> Hi Kevin, >>> >>> You want to make sure of this. >>> >>> Unless you recreated the OSDs 4 and 23 and had new data written on >>> them, they should still host the data you need. >>> What Dan suggested (export the 7 inconsistent PGs and import them on >>> a healthy OSD) seems to be the only way to recover your lost data, as >>> with 4 hosts and 2 OSDs lost, you're left with 2 chunks of >>> data/parity when you actually need 3 to access it. Reducing min_size >>> to 3 will not help. >>> >>> Have a look here: >>> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html >>> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html >>> >>> >>> This is probably the best way you want to follow form now on. >>> >>> Regards, >>> Frédéric. >>> >>>>> >>>>> If OTOH those PGs are really lost forever, and someone else should >>>>> confirm what I say here, I think the next step would be to force >>>>> recreate the incomplete PGs then run a set of cephfs scrub/repair >>>>> disaster recovery cmds to recover what you can from the cephfs. >>>>> >>>>> -- dan >>>> >>>> would this let us recover at least some of the data on the pgs? If >>>> not we would just set up a new ceph directly without fixing the old >>>> one and copy whatever is left. >>>> >>>> Best regards, >>>> >>>> Kevin >>>> >>>> >>>> >>>>> >>>>> On Mon, May 13, 2019 at 4:20 PM Kevin Flöh <kevin.floeh@xxxxxxx> >>>>> wrote: >>>>>> Dear ceph experts, >>>>>> >>>>>> we have several (maybe related) problems with our ceph cluster, >>>>>> let me >>>>>> first show you the current ceph status: >>>>>> >>>>>> cluster: >>>>>> id: 23e72372-0d44-4cad-b24f-3641b14b86f4 >>>>>> health: HEALTH_ERR >>>>>> 1 MDSs report slow metadata IOs >>>>>> 1 MDSs report slow requests >>>>>> 1 MDSs behind on trimming >>>>>> 1/126319678 objects unfound (0.000%) >>>>>> 19 scrub errors >>>>>> Reduced data availability: 2 pgs inactive, 2 pgs >>>>>> incomplete >>>>>> Possible data damage: 7 pgs inconsistent >>>>>> Degraded data redundancy: 1/500333881 objects degraded >>>>>> (0.000%), 1 pg degraded >>>>>> 118 stuck requests are blocked > 4096 sec. >>>>>> Implicated osds >>>>>> 24,32,91 >>>>>> >>>>>> services: >>>>>> mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02 >>>>>> mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu >>>>>> mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3 >>>>>> up:standby >>>>>> osd: 96 osds: 96 up, 96 in >>>>>> >>>>>> data: >>>>>> pools: 2 pools, 4096 pgs >>>>>> objects: 126.32M objects, 260TiB >>>>>> usage: 372TiB used, 152TiB / 524TiB avail >>>>>> pgs: 0.049% pgs not active >>>>>> 1/500333881 objects degraded (0.000%) >>>>>> 1/126319678 objects unfound (0.000%) >>>>>> 4076 active+clean >>>>>> 10 active+clean+scrubbing+deep >>>>>> 7 active+clean+inconsistent >>>>>> 2 incomplete >>>>>> 1 active+recovery_wait+degraded >>>>>> >>>>>> io: >>>>>> client: 449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr >>>>>> >>>>>> >>>>>> and ceph health detail: >>>>>> >>>>>> >>>>>> HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow >>>>>> requests; >>>>>> 1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19 >>>>>> scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs >>>>>> incomplete; Possible data damage: 7 pgs inconsistent; Degraded data >>>>>> redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118 >>>>>> stuck requests are blocked > 4096 sec. Implicated osds 24,32,91 >>>>>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs >>>>>> mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are >>>>>> blocked > 30 secs, oldest blocked for 351193 secs >>>>>> MDS_SLOW_REQUEST 1 MDSs report slow requests >>>>>> mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are >>>>>> blocked > 30 sec >>>>>> MDS_TRIM 1 MDSs behind on trimming >>>>>> mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming >>>>>> (46034/128) >>>>>> max_segments: 128, num_segments: 46034 >>>>>> OBJECT_UNFOUND 1/126319687 objects unfound (0.000%) >>>>>> pg 1.24c has 1 unfound objects >>>>>> OSD_SCRUB_ERRORS 19 scrub errors >>>>>> PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs >>>>>> incomplete >>>>>> pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31 >>>>>> min_size from 3 may help; search ceph.com/docs for 'incomplete') >>>>>> pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31 >>>>>> min_size from 3 may help; search ceph.com/docs for 'incomplete') >>>>>> PG_DAMAGED Possible data damage: 7 pgs inconsistent >>>>>> pg 1.17f is active+clean+inconsistent, acting [65,49,25,4] >>>>>> pg 1.1e0 is active+clean+inconsistent, acting [11,32,4,81] >>>>>> pg 1.203 is active+clean+inconsistent, acting [43,49,4,72] >>>>>> pg 1.5d3 is active+clean+inconsistent, acting [37,27,85,4] >>>>>> pg 1.779 is active+clean+inconsistent, acting [50,4,77,62] >>>>>> pg 1.77c is active+clean+inconsistent, acting [21,49,40,4] >>>>>> pg 1.7c3 is active+clean+inconsistent, acting [1,14,68,4] >>>>>> PG_DEGRADED Degraded data redundancy: 1/500333908 objects degraded >>>>>> (0.000%), 1 pg degraded >>>>>> pg 1.24c is active+recovery_wait+degraded, acting >>>>>> [32,4,61,36], 1 >>>>>> unfound >>>>>> REQUEST_STUCK 118 stuck requests are blocked > 4096 sec. >>>>>> Implicated osds >>>>>> 24,32,91 >>>>>> 118 ops are blocked > 536871 sec >>>>>> osds 24,32,91 have stuck requests > 536871 sec >>>>>> >>>>>> >>>>>> Let me briefly summarize the setup: We have 4 nodes with 24 osds each >>>>>> and use 3+1 erasure coding. The nodes run on centos7 and we use, >>>>>> due to >>>>>> a major mistake when setting up the cluster, more than one ceph >>>>>> version >>>>>> on the nodes, 3 nodes run on 12.2.12 and one runs on 13.2.5. We are >>>>>> currently not daring to update all nodes to 13.2.5. For all the >>>>>> version >>>>>> details see: >>>>>> >>>>>> { >>>>>> "mon": { >>>>>> "ceph version 12.2.12 >>>>>> (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 3 >>>>>> }, >>>>>> "mgr": { >>>>>> "ceph version 12.2.12 >>>>>> (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 2 >>>>>> }, >>>>>> "osd": { >>>>>> "ceph version 12.2.12 >>>>>> (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 72, >>>>>> "ceph version 13.2.5 >>>>>> (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) >>>>>> mimic (stable)": 24 >>>>>> }, >>>>>> "mds": { >>>>>> "ceph version 12.2.12 >>>>>> (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 4 >>>>>> }, >>>>>> "overall": { >>>>>> "ceph version 12.2.12 >>>>>> (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 81, >>>>>> "ceph version 13.2.5 >>>>>> (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) >>>>>> mimic (stable)": 24 >>>>>> } >>>>>> } >>>>>> >>>>>> Here is what happened: One osd daemon could not be started and >>>>>> therefore >>>>>> we decided to mark the osd as lost and set it up from scratch. Ceph >>>>>> started recovering and then we lost another osd with the same >>>>>> behavior. >>>>>> We did the same as for the first osd. And now we are stuck with 2 >>>>>> pgs in >>>>>> incomplete. Ceph pg query gives the following problem: >>>>>> >>>>>> "down_osds_we_would_probe": [], >>>>>> "peering_blocked_by": [], >>>>>> "peering_blocked_by_detail": [ >>>>>> { >>>>>> "detail": >>>>>> "peering_blocked_by_history_les_bound" >>>>>> } >>>>>> >>>>>> We already tried to set "osd_find_best_info_ignore_history_les": >>>>>> "true" >>>>>> for the affected osds, which had no effect. Furthermore, the >>>>>> cluster is >>>>>> behind on trimming by more than 40,000 segments and we have >>>>>> folders and >>>>>> files which cannot be deleted or moved. (which are not on the 2 >>>>>> incomplete pgs). Is there any way to solve these problems? >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Kevin >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com