Re: recovery_unfound

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chad,

In case it's relevant we are on Nautilus 14.2.6, not Mimic.

I've followed Paul's advice and issued a "ceph osd down XXX" command for
the primary osd in each affected pg. I've also tried doing a systemctl
restart for several of the primary osd's, again with no apparent effect.
Unfortunately, we still have the same number of unfound objects.

ceph health shows:
HEALTH_ERR 23/435429429 objects unfound (0.000%); Possible data damage:
14 pgs recovery_unfound; Degraded data redundancy: 46/4349729009 objects
degraded (0.000%), 14 pgs degraded; 9 pgs not deep-scrubbed in time; 9
pgs not scrubbed in time

If the unfound items were caused by a faulty script of mine*, rather
than a fault with ceph, then presumably the objects are lost forever?

If so, can we identify which files are lost from the OID?

And should we now clean up by running "ceph pg xxxxx  mark_unfound_lost
delete" on each affected pg?


* Script was a "rolling upgrade" used after upgrading ceph, that should
have restarted all osds on each node sequentially. Due to an error of
mine, the script didn't wait for the all osds to *properly* start before
moving on to the next node. "norecover" was set, which may have
aggravated the situation.

thanks again,

Jake

On 2/4/20 4:35 PM, Chad William Seys wrote:
> Hi Jake and all,
>   We're having what looks to be the exact same problem.  In our case it
> happened when I was "draining" an OSD for removal.  (ceph crush
> remove...)  Adding the OSD back doesn't help workaround the bug.
> Everything is either triply replicated or EC k3m2, either of which
> should stand loss of two hosts (much less one OSD).
>   We're running 13.2.6 .
>   I tried various OSD restarts, deep-scrubs, with no change. I'm leaving
> things alone hoping that croit.io will update their package to 13.2.8
> soonish.  Maybe that will help kick it in the pants.
> 
> Chad.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


-- 
Jake Grimmett

MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux