Re: Ceph stops responding

Henrik Korkuc <lists@xxxxxxxxx> · Fri, 07 Mar 2014 10:07:32 +0200

On 2014.03.05 13:23, Georgios Dimitrakakis wrote:
> Actually there are two monitors (my bad in the previous e-mail).
> One at the MASTER and one at the CLIENT.
>
> The monitor in CLIENT is failing with the following
>
> 2014-03-05 13:08:38.821135 7f76ba82b700  1
> mon.client1@0(leader).paxos(paxos active c 25603..26314) is_readable
> now=2014-03-05 13:08:38.821136 lease_expire=2014-03-05 13:08:40.845978
> has v0 lc 26314
> 2014-03-05 13:08:40.599287 7f76bb22c700  0
> mon.client1@0(leader).data_health(86) update_stats avail 4% total
> 51606140 used 46645692 avail 2339008
> 2014-03-05 13:08:40.599527 7f76bb22c700 -1
> mon.client1@0(leader).data_health(86) reached critical levels of
> available space on data store -- shutdown!
> 2014-03-05 13:08:40.599530 7f76bb22c700  0 ** Shutdown via Data Health
> Service **
> 2014-03-05 13:08:40.599557 7f76b9328700 -1 mon.client1@0(leader) e2
> *** Got Signal Interrupt ***
> 2014-03-05 13:08:40.599568 7f76b9328700  1 mon.client1@0(leader) e2
> shutdown
> 2014-03-05 13:08:40.599602 7f76b9328700  0 quorum service shutdown
> 2014-03-05 13:08:40.599609 7f76b9328700  0
> mon.client1@0(shutdown).health(86) HealthMonitor::service_shutdown 1
> services
> 2014-03-05 13:08:40.599613 7f76b9328700  0 quorum service shutdown
>
>
> The thing is that there is plenty of space in that host (CLIENT)
>
> # df -h
> Filesystem                     Size  Used Avail Use% Mounted on
> /dev/mapper/vg_one-lv_root     50G    45G  2.3G  96% /
> tmpfs                          5.9G     0  5.9G   0% /dev/shm
> /dev/sda1                      485M   76M  384M  17% /boot
> /dev/mapper/vg_one-lv_home     862G   249G 569G  31% /home
>
>
> On the other hand the other host (MASTER) is running low on disk space
> (93% is full).
>
> But why is the CLIENT failing while the MASTER is still running even
> though is running low on disk space?
>

CLIENT has less space percentage available than master (96% used vs
93%), I guess that is your problem.

> I 'll try to free some space and see what happens next...
>
> Best,
>
> G.
>
>
>
> On Wed, 05 Mar 2014 11:50:57 +0100, Wido den Hollander wrote:
>> On 03/05/2014 11:21 AM, Georgios Dimitrakakis wrote:
>>> My setup consists of two nodes.
>>>
>>> The first node (master) is running:
>>>
>>> -mds
>>> -mon
>>> -osd.0
>>>
>>>
>>>
>>> and the second node (CLIENT) is running:
>>>
>>> -osd.1
>>>
>>>
>>> Therefore I 've restarted ceph services on both nodes
>>>
>>>
>>> Leaving the "ceph -w" running for as long as it can after a few seconds
>>> the error that is produced is this:
>>>
>>> 2014-03-05 12:08:17.715699 7fba13fff700  0 monclient: hunting for
>>> new mon
>>> 2014-03-05 12:08:17.716108 7fba102f8700  0 -- 192.168.0.10:0/1008298 >>
>>> X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1
>>> c=0x7fba080090b0).fault
>>>
>>>
>>> (where X.Y.Z.X is the public IP of the CLIENT node).
>>>
>>> And it keep goes on...
>>>
>>> "ceph-health" after a few minutes shows the following
>>>
>>> 2014-03-05 12:12:58.355677 7effc52fb700  0 monclient(hunting):
>>> authenticate timed out after 300
>>> 2014-03-05 12:12:58.355717 7effc52fb700  0 librados: client.admin
>>> authentication error (110) Connection timed out
>>> Error connecting to cluster: TimedOut
>>>
>>>
>>> Any ideas now??
>>>
>>
>> Is the monitor actually running on the first node? If not, checked
>> the logs in /var/log/ceph as to why it isn't running.
>>
>> Or maybe you just need to start it.
>>
>> Wido
>>
>>> Best,
>>>
>>> G.
>>>
>>> On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote:
>>>> First try to start OSD nodes by restarting the ceph service on ceph
>>>> nodes. If it works file then you could able to see ceph-osd process
>>>> running in process list. And do not need to add any public or private
>>>> network in ceph.conf. If none of the OSDs run then you need to
>>>> reconfigure them from monitor node.
>>>>
>>>> Please check ceph-mon process is running on monitor node or not?
>>>> ceph-mds should not run.
>>>>
>>>> also check /etc/hosts file with valid ip address of cluster nodes
>>>>
>>>> Finally check ceph.client.admin.keyring and ceph.bootstrap-osd.keyring
>>>> should be matched in all the cluster nodes.
>>>>
>>>> Best of luck.
>>>> Srinivas.
>>>>
>>>> On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I have installed ceph and created two osds and was very happy with
>>>>> that but apparently not everything was correct.
>>>>>
>>>>> Today after a system reboot the cluster comes up and for a few
>>>>> moments it seems that its ok (using the "ceph health" command) but
>>>>> after a few seconds the "ceph health" command doesnt produce any
>>>>> output at all.
>>>>>
>>>>> It justs stays there without anything on the screen...
>>>>>
>>>>> ceph -w is doing the same as well...
>>>>>
>>>>> If I restart the ceph services ("service ceph restart") again for a
>>>>> few seconds is working but after a few more it stays frozen.
>>>>>
>>>>> Initially I thought that this was a firewall problem but apparently
>>>>> it isnt.
>>>>>
>>>>> Then I though that this had to do with the
>>>>>
>>>>> public_network
>>>>>
>>>>> cluster_network
>>>>>
>>>>> not defined in ceph.conf and changed that.
>>>>>
>>>>> No matter whatever I do the cluster works for a few seconds after
>>>>> the service restart and then it stops responding...
>>>>>
>>>>> Any help much appreciated!!!
>>>>>
>>>>> Best,
>>>>>
>>>>> G.
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx [1]
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]
>>>>
>>>>
>>>>
>>>> Links:
>>>> ------
>>>> [1] mailto:ceph-users@xxxxxxxxxxxxxx
>>>> [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> [3] mailto:giorgis@xxxxxxxxxxxx
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com