Re: Not able to start OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

have you checked the output of "ceph-disk list” on the nodes where the OSDs are not coming back on? 

This should give you a hint on what’s going one.

Also use dmesg to search for any error message

And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages produced by the OSD itself when it starts.

Regards
JC

> On Oct 19, 2017, at 12:11, Josy <josy@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> I am not able to start some of the OSDs in the cluster.
> 
> This is a test cluster and had 8 OSDs. One node was taken out for maintenance. I set the noout flag and after the server came back up I unset the noout flag.
> 
> Suddenly couple of OSDs went down.
> 
> And now I can start the OSDs manually from each node, but the status is still "down"
> 
> $  ceph osd stat
> 8 osds: 2 up, 5 in
> 
> 
> $ ceph osd tree
> ID  CLASS WEIGHT  TYPE NAME                 STATUS REWEIGHT PRI-AFF
>  -1       7.97388 root default
>  -3       1.86469     host a1-osd
>   1   ssd 1.86469         osd.1               down        0 1.00000
>  -5       0.87320     host a2-osd
>   2   ssd 0.87320         osd.2               down        0 1.00000
>  -7       0.87320     host a3-osd
>   4   ssd 0.87320         osd.4               down  1.00000 1.00000
>  -9       0.87320     host a4-osd
>   8   ssd 0.87320         osd.8                 up  1.00000 1.00000
> -11       0.87320     host a5-osd
>  12   ssd 0.87320         osd.12              down  1.00000 1.00000
> -13       0.87320     host a6-osd
>  17   ssd 0.87320         osd.17                up  1.00000 1.00000
> -15       0.87320     host a7-osd
>  21   ssd 0.87320         osd.21              down  1.00000 1.00000
> -17       0.87000     host a8-osd
>  28   ssd 0.87000         osd.28              down        0 1.00000
> 
> Also can see this error in each OSD node.
> 
> # systemctl status ceph-osd@1
> ● ceph-osd@1.service - Ceph object storage daemon osd.1
>    Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
>    Active: failed (Result: start-limit) since Thu 2017-10-19 11:35:18 PDT; 19min ago
>   Process: 4163 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
>   Process: 4158 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 4163 (code=killed, signal=ABRT)
> 
> Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service entered failed state.
> Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service holdoff time over, scheduling restart.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: start request repeated too quickly for ceph-osd@1.service
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Failed to start Ceph object storage daemon osd.1.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service entered failed state.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux