Hi Wido,
Yes you are right. After removing the down OSDs, reformatting and bring them up again, at least until 75% of total OSDs, my Ceph Cluster is healthy again. It seem there is high probability of data safety if the total active PGs same with total PGs and total degraded PGs same with total undersized PGs, but it is better to check PGs one by one for make sure there is no incomplete, unfound and/or missing objects.
Anyway, why 75%? Can I reduce this value by resizing (add) the replica of the pool?
Best regards,
On Fri, May 13, 2016 at 5:04 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
> Op 13 mei 2016 om 11:55 schreef Lazuardi Nasution <mrxlazuardin@xxxxxxxxx>:
>
>
> Hi Wido,
>
> The status is same after 24 hour running. It seem that the status will not
> go to fully active+clean until all down OSDs back again. The only way to
> make down OSDs to go back again is reformating or replace if HDDs has
> hardware issue. Do you think that it is safe way to do?
>
Ah, you are probably lacking enough replicas to make the recovery proceed.
If that is needed I would do this OSD by OSD. Your crushmap will probably tell you which OSDs you need to bring back before it works again.
Wido
> Best regards,
>
> On Fri, May 13, 2016 at 4:44 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
> >
> > > Op 13 mei 2016 om 11:34 schreef Lazuardi Nasution <
> > mrxlazuardin@xxxxxxxxx>:
> > >
> > >
> > > Hi,
> > >
> > > After disaster and restarting for automatic recovery, I found following
> > > ceph status. Some OSDs cannot be restarted due to file system corruption
> > > (it seem that xfs is fragile).
> > >
> > > [root@management-b ~]# ceph status
> > > cluster 3810e9eb-9ece-4804-8c56-b986e7bb5627
> > > health HEALTH_WARN
> > > 209 pgs degraded
> > > 209 pgs stuck degraded
> > > 334 pgs stuck unclean
> > > 209 pgs stuck undersized
> > > 209 pgs undersized
> > > recovery 5354/77810 objects degraded (6.881%)
> > > recovery 1105/77810 objects misplaced (1.420%)
> > > monmap e1: 3 mons at {management-a=
> > >
> > 10.255.102.1:6789/0,management-b=10.255.102.2:6789/0,management-c=10.255.102.3:6789/0
> > > }
> > > election epoch 2308, quorum 0,1,2
> > > management-a,management-b,management-c
> > > osdmap e25037: 96 osds: 49 up, 49 in; 125 remapped pgs
> > > flags sortbitwise
> > > pgmap v9024253: 2560 pgs, 5 pools, 291 GB data, 38905 objects
> > > 678 GB used, 90444 GB / 91123 GB avail
> > > 5354/77810 objects degraded (6.881%)
> > > 1105/77810 objects misplaced (1.420%)
> > > 2226 active+clean
> > > 209 active+undersized+degraded
> > > 125 active+remapped
> > > client io 0 B/s rd, 282 kB/s wr, 10 op/s
> > >
> > > Since total active PGs same with total PGs and total degraded PGs same
> > with
> > > total undersized PGs, does it mean that all PGs have at least one good
> > > replica, so I can just mark lost or remove down OSD, reformat again and
> > > then restart them if there is no hardware issue with HDDs? Which one of
> > PGs
> > > status should I pay more attention, degraded or undersized due to lost
> > > object possibility?
> > >
> >
> > Yes. Your system is not reporting any inactive, unfound or stale PGs, so
> > that is good news.
> >
> > However, I recommend that you wait for the system to become fully
> > active+clean before you start removing any OSDs or formatting hard drives.
> > Better be safe than sorry.
> >
> > Wido
> >
> > > Best regards,
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com