Re: ceph-users Digest, Vol 53, Issue 4

Zigor Ozamiz <zigor@xxxxxxxxxxxx> · Tue, 6 Jun 2017 09:08:34 +0200

Hi,

> Is your min-size at least 2? Is it just one OSD affected?
Yes, min_size is 2:

# ceph osd pool get vmtier-10 min_size
min_size: 2

Yes, affected OSD is only one.
>
> If yes and if it is only the journal that is corrupt, but the actual OSD
> store is intact although lagging behind now in writes and you do have
> healthy copies of its PGs elsewhere (hence the min-size requirement) you
> could resolve this situation by:
Hard disk was configured as RAID0 and during the recovery procedure we
discarded by mistake the foreign configuration and had to discard
preserved cache from RAID controller but in the end, we were able to
mount the data partition and looks like everything is working fine (we
didn't get any error from xfs_repair).
>
> 1) ensure the OSD with the corrupt journal is stopped
> 2) recreate the journal
> 3) start the OSD again.
>
> The OSD should peer its PGs and bring them on par with the other copies
> and the cluster should return to healthy state again.
>
> See here (
> http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/
> ) for a more detailed walkthrough. It talks about failed SSD with
> journals but the situation is the same with regards to any journal failure.
>
> Now you mentioned having set the weight to 0 in the meantime, I have no
> idea how this is going to affect the above procedure, maybe you should
> wait for somebody else to comment on this.
>
> Hope this helps a bit,
>
> -K.

Alright, I will wait for someone else's response. Anyway, thanks for
your help.

Regards,

-- 
Zigor Ozamiz
Departamento Técnico de Hostinet SLU
------------------------------------------
http://www.hostinet.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com