Yes, we lost several disks recently and they were all removed probably faster than they should have been (i.e. we didnt wait for them to rebalance individually before removing more). Is there any way to map an object or pg to a cephfs file so at least we will know which files are going to be corrupted if we mark them complete? On Thu, Jun 14, 2018 at 12:13 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 14 Jun 2018, Wyllys Ingersoll wrote: >> I cut out a HUGE list of "purged_snaps" to keep this a little shorter... >> >> $ cat 1.10e.txt >> { >> "state": "incomplete", >> "snap_trimq": "[]", >> "snap_trimq_len": 0, >> "epoch": 465904, >> "up": [ >> 52, >> 23, >> 20 >> ], >> "acting": [ >> 52, >> 23, >> 20 >> ], >> "info": { >> "pgid": "1.10e", >> "last_update": "438490'293946", >> "last_complete": "438490'293946", >> "log_tail": "427182'292446", >> "last_user_version": 0, >> "last_backfill": "MIN", > ... >> "peer_info": [ >> { >> "peer": "5", >> "pgid": "1.10e", >> "last_update": "438490'293946", >> "last_complete": "438490'293946", >> "log_tail": "427182'292446", >> "last_user_version": 0, >> "last_backfill": "MIN", > ... >> }, >> { >> "peer": "10", >> "pgid": "1.10e", >> "last_update": "438490'293946", >> "last_complete": "438490'293946", >> "log_tail": "427182'292446", >> "last_user_version": 0, >> "last_backfill": "MIN", > ... >> } >> ], > > It looks like all of the copies of this PG are in fact incomplete > (partially backfilled, not the complete set of objects). You must have > lost a disk somewhere? Is there another copy? > > If not, then as a last resort you can go look at each one, see which copy > of the PG has the most objects, and mark it complete, and everything else > where backfill from there. That is almost certainly going to admit > defeat and lose some data, though. > > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html