Re: incomplete pgs - cannot clear

Wyllys Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> · Thu, 14 Jun 2018 12:42:20 -0400

Yes, we lost several disks recently and they were all removed probably
faster than they should have been (i.e. we didnt wait for them to
rebalance individually before removing more).

Is there any way to map an object or pg to a cephfs file so at least
we will know which files are going to be corrupted if we mark them
complete?

On Thu, Jun 14, 2018 at 12:13 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Thu, 14 Jun 2018, Wyllys Ingersoll wrote:
>> I cut out a HUGE list of "purged_snaps" to keep this a little shorter...
>>
>> $ cat 1.10e.txt
>> {
>>     "state": "incomplete",
>>     "snap_trimq": "[]",
>>     "snap_trimq_len": 0,
>>     "epoch": 465904,
>>     "up": [
>>         52,
>>         23,
>>         20
>>     ],
>>     "acting": [
>>         52,
>>         23,
>>         20
>>     ],
>>     "info": {
>>         "pgid": "1.10e",
>>         "last_update": "438490'293946",
>>         "last_complete": "438490'293946",
>>         "log_tail": "427182'292446",
>>         "last_user_version": 0,
>>         "last_backfill": "MIN",
> ...
>>     "peer_info": [
>>         {
>>             "peer": "5",
>>             "pgid": "1.10e",
>>             "last_update": "438490'293946",
>>             "last_complete": "438490'293946",
>>             "log_tail": "427182'292446",
>>             "last_user_version": 0,
>>             "last_backfill": "MIN",
> ...
>>         },
>>         {
>>             "peer": "10",
>>             "pgid": "1.10e",
>>             "last_update": "438490'293946",
>>             "last_complete": "438490'293946",
>>             "log_tail": "427182'292446",
>>             "last_user_version": 0,
>>             "last_backfill": "MIN",
> ...
>>         }
>>     ],
>
> It looks like all of the copies of this PG are in fact incomplete
> (partially backfilled, not the complete set of objects).  You must have
> lost a disk somewhere?  Is there another copy?
>
> If not, then as a last resort you can go look at each one, see which copy
> of the PG has the most objects, and mark it complete, and everything else
> where backfill from there.  That is almost certainly going to admit
> defeat and lose some data, though.
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html