Re: Remove Error - "Possible data damage: 2 pgs recovery_unfound"

Philipp Hocke <philipp.hocke@xxxxxxxxxx> · Fri, 21 Aug 2020 12:45:22 +0200

Hi,

you could try following the troubleshooting pg section
https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/#unfound-objects

I had some unfound objects a while ago and managed to restore them to a
previous version.

If you just want to "get rid" of the error and _really really_ don't
care about data loss you can just tell ceph to forget about the lost
objects within the PG(s):

"# ceph pg $PG mark_unfound_lost delete" (I'd be careful though and
exhaust every other possibility before issuing that command).

Best regards

Philipp

On 8/21/20 12:03 PM, Jonathan Sélea wrote:
> Hi everyone,
> I just wanted to ask again about your oppinion about this "problem"
> that I have.
> Thankful for any answer!
>
>
> On 2020-08-19 13:39, Jonathan Sélea wrote:
>> Good afternoon!
>>
>> I have a small Ceph-cluster running with Proxmox, and after an update
>> on one of the nodes and a reboot. So far so good.
>> But after a couple of hours, I saw this:
>>
>> root@pve2:~# ceph health detail
>> HEALTH_ERR 16/1101836 objects unfound (0.001%); Possible data damage:
>> 2 pgs recovery_unfound; Degraded data redundancy: 48/3305508 objects
>> degraded (0.001%), 2 pgs degraded, 2 pgs undersized
>> OBJECT_UNFOUND 16/1101836 objects unfound (0.001%)
>>     pg 1.37 has 6 unfound objects
>>     pg 1.48 has 10 unfound objects
>> PG_DAMAGED Possible data damage: 2 pgs recovery_unfound
>>     pg 1.37 is active+recovery_unfound+undersized+degraded+remapped,
>> acting [11,17], 6 unfound
>>     pg 1.48 is active+recovery_unfound+undersized+degraded+remapped,
>> acting [5,11], 10 unfound
>> PG_DEGRADED Degraded data redundancy: 48/3305508 objects degraded
>> (0.001%), 2 pgs degraded, 2 pgs undersized
>>     pg 1.37 is stuck undersized for 446774.454853, current state
>> active+recovery_unfound+undersized+degraded+remapped, last acting
>> [11,17]
>>     pg 1.48 is stuck undersized for 446774.459466, current state
>> active+recovery_unfound+undersized+degraded+remapped, last acting
>> [5,11]
>>
>>
>> root@pve2:~# ceph -s
>>   cluster:
>>     id:     76e70c34-bce9-4f86-b049-0054f21c3494
>>     health: HEALTH_ERR
>>             16/1101836 objects unfound (0.001%)
>>             Possible data damage: 2 pgs recovery_unfound
>>             Degraded data redundancy: 48/3305508 objects degraded
>> (0.001%), 2 pgs degraded, 2 pgs undersized
>>
>>   services:
>>     mon: 3 daemons, quorum pve3,pve1,pve2 (age 2w)
>>     mgr: pve3(active, since 2w), standbys: pve1, pve2
>>     mds: cephfs:1 {0=pve1=up:active} 2 up:standby
>>     osd: 25 osds: 25 up (since 5d), 25 in (since 8d); 2 remapped pgs
>>
>>   data:
>>     pools:   4 pools, 672 pgs
>>     objects: 1.10M objects, 2.9 TiB
>>     usage:   8.6 TiB used, 12 TiB / 21 TiB avail
>>     pgs:     48/3305508 objects degraded (0.001%)
>>              16/1101836 objects unfound (0.001%)
>>              669 active+clean
>>              2   active+recovery_unfound+undersized+degraded+remapped
>>              1   active+clean+scrubbing+deep
>>
>>   io:
>>     client:   680 B/s rd, 2.6 MiB/s wr, 0 op/s rd, 151 op/s wr
>>
>>
>> I am not really concerned over lost data, since I am 99% sure it
>> belonged to a faulty prometheus server anyway.
>> The question is, how can I remove the warnings without affecting the
>> other objects?
>>
>> Thankful for any pointers!
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx