Re: cluster not coming up after reboot

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 22 Apr 2015 10:35:10 -0700

On Wed, Apr 22, 2015 at 8:17 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
> Hi,
>
> I changed the cluster network parameter in the config files, restarted the
> monitors , and then restarted all the OSDs (shouldn't have done that).

Do you mean that you changed the IP addresses of the monitors in the
config files everywhere, and then tried to restart things? Or
something else?

> Now
> the OSDS keep on crashing, and the cluster is not able to restore.. I
> eventually rebooted the whole cluster, but the problem remains: For a moment
> all 280 OSDs are up, and then they start crashing rapidly until there are
> only less than 100 left (and eventually 30 or so).

Are the OSDs actually crashing, or are they getting shut down? If
they're crashing, can you please provide the actual backtrace? The
logs you're including below are all fairly low level and generally
don't even mean something has to be wrong.

> In the log files I see different kind of messages: Some OSDs have:
> <snip>
> I tested the network, the hosts can reach one another on both networks..

What configurations did you test?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com