Hi,
I am not able to start some of the OSDs in the cluster.
This is a test cluster and had 8 OSDs. One node was taken out for
maintenance. I set the noout flag and after the server came back up I
unset the noout flag.
Suddenly couple of OSDs went down.
And now I can start the OSDs manually from each node, but the status is
still "down"
$ ceph osd stat
8 osds: 2 up, 5 in
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 7.97388 root default
-3 1.86469 host a1-osd
1 ssd 1.86469 osd.1 down 0 1.00000
-5 0.87320 host a2-osd
2 ssd 0.87320 osd.2 down 0 1.00000
-7 0.87320 host a3-osd
4 ssd 0.87320 osd.4 down 1.00000 1.00000
-9 0.87320 host a4-osd
8 ssd 0.87320 osd.8 up 1.00000 1.00000
-11 0.87320 host a5-osd
12 ssd 0.87320 osd.12 down 1.00000 1.00000
-13 0.87320 host a6-osd
17 ssd 0.87320 osd.17 up 1.00000 1.00000
-15 0.87320 host a7-osd
21 ssd 0.87320 osd.21 down 1.00000 1.00000
-17 0.87000 host a8-osd
28 ssd 0.87000 osd.28 down 0 1.00000
Also can see this error in each OSD node.
# systemctl status ceph-osd@1
● ceph-osd@1.service - Ceph object storage daemon osd.1
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled;
vendor preset: disabled)
Active: failed (Result: start-limit) since Thu 2017-10-19 11:35:18
PDT; 19min ago
Process: 4163 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
--id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
Process: 4158 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 4163 (code=killed, signal=ABRT)
Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service
entered failed state.
Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service holdoff
time over, scheduling restart.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: start request repeated too
quickly for ceph-osd@1.service
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Failed to start Ceph object
storage daemon osd.1.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service
entered failed state.
Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com