Re: cluster not coming up after reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04/22/2015 07:35 PM, Gregory Farnum wrote:
On Wed, Apr 22, 2015 at 8:17 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
Hi,

I changed the cluster network parameter in the config files, restarted the
monitors , and then restarted all the OSDs (shouldn't have done that).

Do you mean that you changed the IP addresses of the monitors in the
config files everywhere, and then tried to restart things? Or
something else?
I only changed the value of the cluster network to a different one then the public network

Now
the OSDS keep on crashing, and the cluster is not able to restore.. I
eventually rebooted the whole cluster, but the problem remains: For a moment
all 280 OSDs are up, and then they start crashing rapidly until there are
only less than 100 left (and eventually 30 or so).

Are the OSDs actually crashing, or are they getting shut down? If
they're crashing, can you please provide the actual backtrace? The
logs you're including below are all fairly low level and generally
don't even mean something has to be wrong.

It seems I did not tested the network throughfully enough, there was one host that was unable to connect to the cluster network, only the public network. I've found this out after all but the osds of that host came up after a few hours. I fixed the network issue and all was fine (only a few peering problems, but a restart of those osds blocking was sufficient) There were no backtraces, and indeed I found out there were some shutdown messages in the logs.

So it is all fixed now, but is it explainable that at first about 90% of the OSDS going into shutdown over and over, and only after some time got in a stable situation, because of one host network failure ?

Thanks again!


In the log files I see different kind of messages: Some OSDs have:
<snip>
I tested the network, the hosts can reach one another on both networks..

What configurations did you test?
14 hosts with each 16 keyvalue osds , 2 replicated cache partitions and metadata partitions on 2 SSDs for cephfs.

-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux