Unfortunately, the bad disks were removed and are unavailable. Looking at a couple of the incomplete pgs, they appear to have 0 objects in any of the osds, so at least some of the data appears to be completely lost. On Thu, Jun 14, 2018 at 12:57 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 14 Jun 2018, Wyllys Ingersoll wrote: >> Yes, we lost several disks recently and they were all removed probably >> faster than they should have been (i.e. we didnt wait for them to >> rebalance individually before removing more). >> >> Is there any way to map an object or pg to a cephfs file so at least >> we will know which files are going to be corrupted if we mark them >> complete? > > No... the cluster doesn't know what objects where in the PG if the PG > is incomplete. It doesn't keep a parallel record of what would have been > stored. > > I'd try dig up the removed disks... > > sage > >> >> On Thu, Jun 14, 2018 at 12:13 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > On Thu, 14 Jun 2018, Wyllys Ingersoll wrote: >> >> I cut out a HUGE list of "purged_snaps" to keep this a little shorter... >> >> >> >> $ cat 1.10e.txt >> >> { >> >> "state": "incomplete", >> >> "snap_trimq": "[]", >> >> "snap_trimq_len": 0, >> >> "epoch": 465904, >> >> "up": [ >> >> 52, >> >> 23, >> >> 20 >> >> ], >> >> "acting": [ >> >> 52, >> >> 23, >> >> 20 >> >> ], >> >> "info": { >> >> "pgid": "1.10e", >> >> "last_update": "438490'293946", >> >> "last_complete": "438490'293946", >> >> "log_tail": "427182'292446", >> >> "last_user_version": 0, >> >> "last_backfill": "MIN", >> > ... >> >> "peer_info": [ >> >> { >> >> "peer": "5", >> >> "pgid": "1.10e", >> >> "last_update": "438490'293946", >> >> "last_complete": "438490'293946", >> >> "log_tail": "427182'292446", >> >> "last_user_version": 0, >> >> "last_backfill": "MIN", >> > ... >> >> }, >> >> { >> >> "peer": "10", >> >> "pgid": "1.10e", >> >> "last_update": "438490'293946", >> >> "last_complete": "438490'293946", >> >> "log_tail": "427182'292446", >> >> "last_user_version": 0, >> >> "last_backfill": "MIN", >> > ... >> >> } >> >> ], >> > >> > It looks like all of the copies of this PG are in fact incomplete >> > (partially backfilled, not the complete set of objects). You must have >> > lost a disk somewhere? Is there another copy? >> > >> > If not, then as a last resort you can go look at each one, see which copy >> > of the PG has the most objects, and mark it complete, and everything else >> > where backfill from there. That is almost certainly going to admit >> > defeat and lose some data, though. >> > >> > sage >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html