Re: Ceph Recovery

Wido den Hollander <wido@xxxxxxxx> · Fri, 13 May 2016 12:04:18 +0200 (CEST)

> Op 13 mei 2016 om 11:55 schreef Lazuardi Nasution <mrxlazuardin@xxxxxxxxx>:
> 
> 
> Hi Wido,
> 
> The status is same after 24 hour running. It seem that the status will not
> go to fully active+clean until all down OSDs back again. The only way to
> make down OSDs to go back again is reformating or replace if HDDs has
> hardware issue. Do you think that it is safe way to do?
> 

Ah, you are probably lacking enough replicas to make the recovery proceed.

If that is needed I would do this OSD by OSD. Your crushmap will probably tell you which OSDs you need to bring back before it works again.

Wido

> Best regards,
> 
> On Fri, May 13, 2016 at 4:44 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
> 
> >
> > > Op 13 mei 2016 om 11:34 schreef Lazuardi Nasution <
> > mrxlazuardin@xxxxxxxxx>:
> > >
> > >
> > > Hi,
> > >
> > > After disaster and restarting for automatic recovery, I found following
> > > ceph status. Some OSDs cannot be restarted due to file system corruption
> > > (it seem that xfs is fragile).
> > >
> > > [root@management-b ~]# ceph status
> > >     cluster 3810e9eb-9ece-4804-8c56-b986e7bb5627
> > >      health HEALTH_WARN
> > >             209 pgs degraded
> > >             209 pgs stuck degraded
> > >             334 pgs stuck unclean
> > >             209 pgs stuck undersized
> > >             209 pgs undersized
> > >             recovery 5354/77810 objects degraded (6.881%)
> > >             recovery 1105/77810 objects misplaced (1.420%)
> > >      monmap e1: 3 mons at {management-a=
> > >
> > 10.255.102.1:6789/0,management-b=10.255.102.2:6789/0,management-c=10.255.102.3:6789/0
> > > }
> > >             election epoch 2308, quorum 0,1,2
> > > management-a,management-b,management-c
> > >      osdmap e25037: 96 osds: 49 up, 49 in; 125 remapped pgs
> > >             flags sortbitwise
> > >       pgmap v9024253: 2560 pgs, 5 pools, 291 GB data, 38905 objects
> > >             678 GB used, 90444 GB / 91123 GB avail
> > >             5354/77810 objects degraded (6.881%)
> > >             1105/77810 objects misplaced (1.420%)
> > >                 2226 active+clean
> > >                  209 active+undersized+degraded
> > >                  125 active+remapped
> > >   client io 0 B/s rd, 282 kB/s wr, 10 op/s
> > >
> > > Since total active PGs same with total PGs and total degraded PGs same
> > with
> > > total undersized PGs, does it mean that all PGs have at least one good
> > > replica, so I can just mark lost or remove down OSD, reformat again and
> > > then restart them if there is no hardware issue with HDDs? Which one of
> > PGs
> > > status should I pay more attention, degraded or undersized due to lost
> > > object possibility?
> > >
> >
> > Yes. Your system is not reporting any inactive, unfound or stale PGs, so
> > that is good news.
> >
> > However, I recommend that you wait for the system to become fully
> > active+clean before you start removing any OSDs or formatting hard drives.
> > Better be safe than sorry.
> >
> > Wido
> >
> > > Best regards,
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com