Not able to start OSD

Josy <josy@xxxxxxxxxxxxxxxxxxxxx> · Fri, 20 Oct 2017 00:41:13 +0530

Hi,

I am not able to start some of the OSDs in the cluster.

This is a test cluster and had 8 OSDs. One node was taken out for 
maintenance. I set the noout flag and after the server came back up I 
unset the noout flag.

Suddenly couple of OSDs went down.

And now I can start the OSDs manually from each node, but the status is 
still "down"

$  ceph osd stat
8 osds: 2 up, 5 in

$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                 STATUS REWEIGHT PRI-AFF
 -1       7.97388 root default
 -3       1.86469     host a1-osd
  1   ssd 1.86469         osd.1               down        0 1.00000
 -5       0.87320     host a2-osd
  2   ssd 0.87320         osd.2               down        0 1.00000
 -7       0.87320     host a3-osd
  4   ssd 0.87320         osd.4               down  1.00000 1.00000
 -9       0.87320     host a4-osd
  8   ssd 0.87320         osd.8                 up  1.00000 1.00000
-11       0.87320     host a5-osd
 12   ssd 0.87320         osd.12              down  1.00000 1.00000
-13       0.87320     host a6-osd
 17   ssd 0.87320         osd.17                up  1.00000 1.00000
-15       0.87320     host a7-osd
 21   ssd 0.87320         osd.21              down  1.00000 1.00000
-17       0.87000     host a8-osd
 28   ssd 0.87000         osd.28              down        0 1.00000

Also can see this error in each OSD node.

# systemctl status ceph-osd@1
● ceph-osd@1.service - Ceph object storage daemon osd.1
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; 
vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2017-10-19 11:35:18 
PDT; 19min ago
  Process: 4163 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} 
--id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 4158 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 4163 (code=killed, signal=ABRT)

Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service 
entered failed state.
Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service holdoff 
time over, scheduling restart.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: start request repeated too 
quickly for ceph-osd@1.service
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Failed to start Ceph object 
storage daemon osd.1.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service 
entered failed state.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com