Re: Nautilus cluster damaged + crashing OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Yes, it looks like you hit the same bug.

My corruption back then happed because the server was out-of-memory and OSDs restarted and crashed quickly again and
again for quite some time...

What I think happens is that the journals somehow get out of sync between OSDs, which is something that should
definitely not happen under the intended consistency guarantees.

However, I've managed to resolve it back then by deleting the PG with the older log (under the assumption that the newer
one is the more recent and better one). This only works if enough shards of that PG are available of course, and then
the regular recovery process will restore the missing shards again.

I hope my script still works for you. If you need any help, I'll see what I can do :)
If things fail, you can still manually import the exported-and-deleted PGs back into any OSD (which will probably cause
the other OSDs of the PG to crash since then the logs won't overlap once again).


Cheers
  -- Jonas

On 21/04/2020 11.26, Robert Sander wrote:
> Hi,
> 
> On 21.04.20 10:33, Paul Emmerich wrote:
>> On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>>
>>> Wait for recovery to finish so you know whether any data from the down
>>> OSDs is required. If not just reprovision them.
>>
>> Recovery will not finish from this state as several PGs are down and/or stale.
>>
> 
> Thanks for your input so far.
> 
> It looks like this issue: https://tracker.ceph.com/issues/36337
> We will try to use the linked Python script to repair the OSD.
> ceph-bluestore-tool repair did not find anything.
> 
> Regards
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux