Hi, > Is your min-size at least 2? Is it just one OSD affected? Yes, min_size is 2: # ceph osd pool get vmtier-10 min_size min_size: 2 Yes, affected OSD is only one. > > If yes and if it is only the journal that is corrupt, but the actual OSD > store is intact although lagging behind now in writes and you do have > healthy copies of its PGs elsewhere (hence the min-size requirement) you > could resolve this situation by: Hard disk was configured as RAID0 and during the recovery procedure we discarded by mistake the foreign configuration and had to discard preserved cache from RAID controller but in the end, we were able to mount the data partition and looks like everything is working fine (we didn't get any error from xfs_repair). > > 1) ensure the OSD with the corrupt journal is stopped > 2) recreate the journal > 3) start the OSD again. > > The OSD should peer its PGs and bring them on par with the other copies > and the cluster should return to healthy state again. > > See here ( > http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/ > ) for a more detailed walkthrough. It talks about failed SSD with > journals but the situation is the same with regards to any journal failure. > > Now you mentioned having set the weight to 0 in the meantime, I have no > idea how this is going to affect the above procedure, maybe you should > wait for somebody else to comment on this. > > Hope this helps a bit, > > -K. Alright, I will wait for someone else's response. Anyway, thanks for your help. Regards, -- Zigor Ozamiz Departamento Técnico de Hostinet SLU ------------------------------------------ http://www.hostinet.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com