> Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution <mrxlazuardin@xxxxxxxxx>: > > > Hi Wido, > > Yes you are right. After removing the down OSDs, reformatting and bring > them up again, at least until 75% of total OSDs, my Ceph Cluster is healthy > again. It seem there is high probability of data safety if the total active > PGs same with total PGs and total degraded PGs same with total undersized > PGs, but it is better to check PGs one by one for make sure there is no > incomplete, unfound and/or missing objects. > > Anyway, why 75%? Can I reduce this value by resizing (add) the replica of > the pool? > It completely depends on the CRUSHMap how many OSDs have to be added back to allow the cluster to recover. A CRUSHmap has failure domains which is usually a host. You have to make sure you have enough 'hosts' online with OSDs for each replica. So with 3 replicas you need 3 hosts online with OSDs on there. You can lower the replica count of a pool (size), but that makes it more vulnerable to data loss. Wido > Best regards, > > On Fri, May 13, 2016 at 5:04 PM, Wido den Hollander <wido@xxxxxxxx> wrote: > > > > > > Op 13 mei 2016 om 11:55 schreef Lazuardi Nasution < > > mrxlazuardin@xxxxxxxxx>: > > > > > > > > > Hi Wido, > > > > > > The status is same after 24 hour running. It seem that the status will > > not > > > go to fully active+clean until all down OSDs back again. The only way to > > > make down OSDs to go back again is reformating or replace if HDDs has > > > hardware issue. Do you think that it is safe way to do? > > > > > > > Ah, you are probably lacking enough replicas to make the recovery proceed. > > > > If that is needed I would do this OSD by OSD. Your crushmap will probably > > tell you which OSDs you need to bring back before it works again. > > > > Wido > > > > > Best regards, > > > > > > On Fri, May 13, 2016 at 4:44 PM, Wido den Hollander <wido@xxxxxxxx> > > wrote: > > > > > > > > > > > > Op 13 mei 2016 om 11:34 schreef Lazuardi Nasution < > > > > mrxlazuardin@xxxxxxxxx>: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > After disaster and restarting for automatic recovery, I found > > following > > > > > ceph status. Some OSDs cannot be restarted due to file system > > corruption > > > > > (it seem that xfs is fragile). > > > > > > > > > > [root@management-b ~]# ceph status > > > > > cluster 3810e9eb-9ece-4804-8c56-b986e7bb5627 > > > > > health HEALTH_WARN > > > > > 209 pgs degraded > > > > > 209 pgs stuck degraded > > > > > 334 pgs stuck unclean > > > > > 209 pgs stuck undersized > > > > > 209 pgs undersized > > > > > recovery 5354/77810 objects degraded (6.881%) > > > > > recovery 1105/77810 objects misplaced (1.420%) > > > > > monmap e1: 3 mons at {management-a= > > > > > > > > > > > 10.255.102.1:6789/0,management-b=10.255.102.2:6789/0,management-c=10.255.102.3:6789/0 > > > > > } > > > > > election epoch 2308, quorum 0,1,2 > > > > > management-a,management-b,management-c > > > > > osdmap e25037: 96 osds: 49 up, 49 in; 125 remapped pgs > > > > > flags sortbitwise > > > > > pgmap v9024253: 2560 pgs, 5 pools, 291 GB data, 38905 objects > > > > > 678 GB used, 90444 GB / 91123 GB avail > > > > > 5354/77810 objects degraded (6.881%) > > > > > 1105/77810 objects misplaced (1.420%) > > > > > 2226 active+clean > > > > > 209 active+undersized+degraded > > > > > 125 active+remapped > > > > > client io 0 B/s rd, 282 kB/s wr, 10 op/s > > > > > > > > > > Since total active PGs same with total PGs and total degraded PGs > > same > > > > with > > > > > total undersized PGs, does it mean that all PGs have at least one > > good > > > > > replica, so I can just mark lost or remove down OSD, reformat again > > and > > > > > then restart them if there is no hardware issue with HDDs? Which one > > of > > > > PGs > > > > > status should I pay more attention, degraded or undersized due to > > lost > > > > > object possibility? > > > > > > > > > > > > > Yes. Your system is not reporting any inactive, unfound or stale PGs, > > so > > > > that is good news. > > > > > > > > However, I recommend that you wait for the system to become fully > > > > active+clean before you start removing any OSDs or formatting hard > > drives. > > > > Better be safe than sorry. > > > > > > > > Wido > > > > > > > > > Best regards, > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com