Hi all J ,
I need some help, I’m in a sad situation : i’ve lost 2 ceph server nodes physically (5 nodes initialy/ 5 monitors). So 3 nodes left : node1, node2, node3
On my first node leaving, I’ve updated the crush map to remove every osds running on those 2 lost servers :
Ceph osd crush remove osds && ceph auth del osds && ceph osd rm osds && ceph osd remove my2Lostnodes
So the crush map seems to be ok now on node1.
Ceph osd tree on node 1 returns that every osds running on node2 are “down 1” and “up 1” on node 3 and “up 1” on node1. Nevertheless on node3 every ceph * commands stay freezed, so I’m not sure the crush map has been updated on node2 and node3. I don’t know how to set ods on node 2 up again.
My node2 says it cannot connect to the cluster !
Ceph –s on node 1 gives me (so still 5 monitors):
cluster 45d9195b-365e-491a-8853-34b46553db94
health HEALTH_WARN 10016 pgs degraded; 10016 pgs stuck unclean; recovery 181055/544038 objects degraded (33.280%); 11/33 in osds are down; noout flag(s) set; 2 mons down, quorum 0,1,2 node1,node2,node3; clock skew detected on mon.node2
monmap e1: 5 mons at {node1=172.23.6.11:6789/0,node2=172.23.6.12:6789/0,node3=172.23.6.13:6789/0,node4=172.23.6.14:6789/0,node5=172.23.6.15:6789/0}, election epoch 488, quorum 0,1,2 node1,node2,node3
mdsmap e48: 1/1/1 up {0=node3=up:active}
osdmap e3852: 33 osds: 22 up, 33 in
flags noout
pgmap v8189785: 10016 pgs, 9 pools, 705 GB data, 177 kobjects
2122 GB used, 90051 GB / 92174 GB avail
181055/544038 objects degraded (33.280%)
10016 active+degraded
client io 0 B/s rd, 233 kB/s wr, 22 op/s
Thx for your help !!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com