Fwd: Is this a deadlock?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We've already restarted the OSD successfully.
Now, we are trying to figure out why the OSD suicide itself

Re:  Is this a deadlock?

Hi, thanks for the quick reply.

We manually deployed this OSD, and it has been running for more than half a year. The output last night should be the latter one that you metioned Last night, one of our switch got some problem and made the OSD unconnected to other peer, which in turn made the monitor to wrongly mark the OSD down.

Thank you:-)



On Wed, 4 Jan 2017 07:49:03 +0000 许雪寒 wrote:

> Hi, everyone.
> 
> Recently in one of our online ceph cluster, one OSD suicided itself after experiencing some network connectivity problem, and the OSD log is as follows:
>

Version of Ceph and all relevant things would help.
Also "some network connectivity problem" is vague, if it were something like a bad port or overloaded switch you'd think that more than one OSD would be affected.

[snip, I have nothing to comment on that part]
> 
> 

> And by the way, when we first tried to restart OSD who committed suicide through “/etc/init.d/ceph start osd.619”, an error was reported, and it said something like “OSD.619 is not found”, which seemed that OSD.619 was never created in this cluster. We are really confused, please help us.
> 
How did you create that OSD?
Manually or with ceph-deploy?
The fact that you're trying to use a SYS-V initscript suggests both and older Ceph version and OS and thus more likely a manual install.

In which case that OSD needs to be defined in ceph.conf on that node.
Full output of that error message would have told us these things, like:
---
root@ceph-04:~# /etc/init.d/ceph start osd.444
/etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph defines mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24)
---
The above is the output from a Hammer cluster with OSDs deployed with ceph-deploy.
And incidentally the "ceph.conf" part of the output is a blatant lie and just a repetition of what it gathered from /var/lib/ceph.

This is a Hammer cluster with manually deployed OSDs:
---
engtest03:~# /etc/init.d/ceph start osd.33
/etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 osd.22 osd.23, /var/lib/ceph defines )
---

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux