On 2014.03.05 13:23, Georgios Dimitrakakis wrote: > Actually there are two monitors (my bad in the previous e-mail). > One at the MASTER and one at the CLIENT. > > The monitor in CLIENT is failing with the following > > 2014-03-05 13:08:38.821135 7f76ba82b700 1 > mon.client1@0(leader).paxos(paxos active c 25603..26314) is_readable > now=2014-03-05 13:08:38.821136 lease_expire=2014-03-05 13:08:40.845978 > has v0 lc 26314 > 2014-03-05 13:08:40.599287 7f76bb22c700 0 > mon.client1@0(leader).data_health(86) update_stats avail 4% total > 51606140 used 46645692 avail 2339008 > 2014-03-05 13:08:40.599527 7f76bb22c700 -1 > mon.client1@0(leader).data_health(86) reached critical levels of > available space on data store -- shutdown! > 2014-03-05 13:08:40.599530 7f76bb22c700 0 ** Shutdown via Data Health > Service ** > 2014-03-05 13:08:40.599557 7f76b9328700 -1 mon.client1@0(leader) e2 > *** Got Signal Interrupt *** > 2014-03-05 13:08:40.599568 7f76b9328700 1 mon.client1@0(leader) e2 > shutdown > 2014-03-05 13:08:40.599602 7f76b9328700 0 quorum service shutdown > 2014-03-05 13:08:40.599609 7f76b9328700 0 > mon.client1@0(shutdown).health(86) HealthMonitor::service_shutdown 1 > services > 2014-03-05 13:08:40.599613 7f76b9328700 0 quorum service shutdown > > > The thing is that there is plenty of space in that host (CLIENT) > > # df -h > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg_one-lv_root 50G 45G 2.3G 96% / > tmpfs 5.9G 0 5.9G 0% /dev/shm > /dev/sda1 485M 76M 384M 17% /boot > /dev/mapper/vg_one-lv_home 862G 249G 569G 31% /home > > > On the other hand the other host (MASTER) is running low on disk space > (93% is full). > > But why is the CLIENT failing while the MASTER is still running even > though is running low on disk space? > CLIENT has less space percentage available than master (96% used vs 93%), I guess that is your problem. > I 'll try to free some space and see what happens next... > > Best, > > G. > > > > On Wed, 05 Mar 2014 11:50:57 +0100, Wido den Hollander wrote: >> On 03/05/2014 11:21 AM, Georgios Dimitrakakis wrote: >>> My setup consists of two nodes. >>> >>> The first node (master) is running: >>> >>> -mds >>> -mon >>> -osd.0 >>> >>> >>> >>> and the second node (CLIENT) is running: >>> >>> -osd.1 >>> >>> >>> Therefore I 've restarted ceph services on both nodes >>> >>> >>> Leaving the "ceph -w" running for as long as it can after a few seconds >>> the error that is produced is this: >>> >>> 2014-03-05 12:08:17.715699 7fba13fff700 0 monclient: hunting for >>> new mon >>> 2014-03-05 12:08:17.716108 7fba102f8700 0 -- 192.168.0.10:0/1008298 >> >>> X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1 >>> c=0x7fba080090b0).fault >>> >>> >>> (where X.Y.Z.X is the public IP of the CLIENT node). >>> >>> And it keep goes on... >>> >>> "ceph-health" after a few minutes shows the following >>> >>> 2014-03-05 12:12:58.355677 7effc52fb700 0 monclient(hunting): >>> authenticate timed out after 300 >>> 2014-03-05 12:12:58.355717 7effc52fb700 0 librados: client.admin >>> authentication error (110) Connection timed out >>> Error connecting to cluster: TimedOut >>> >>> >>> Any ideas now?? >>> >> >> Is the monitor actually running on the first node? If not, checked >> the logs in /var/log/ceph as to why it isn't running. >> >> Or maybe you just need to start it. >> >> Wido >> >>> Best, >>> >>> G. >>> >>> On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote: >>>> First try to start OSD nodes by restarting the ceph service on ceph >>>> nodes. If it works file then you could able to see ceph-osd process >>>> running in process list. And do not need to add any public or private >>>> network in ceph.conf. If none of the OSDs run then you need to >>>> reconfigure them from monitor node. >>>> >>>> Please check ceph-mon process is running on monitor node or not? >>>> ceph-mds should not run. >>>> >>>> also check /etc/hosts file with valid ip address of cluster nodes >>>> >>>> Finally check ceph.client.admin.keyring and ceph.bootstrap-osd.keyring >>>> should be matched in all the cluster nodes. >>>> >>>> Best of luck. >>>> Srinivas. >>>> >>>> On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis wrote: >>>> >>>>> Hi! >>>>> >>>>> I have installed ceph and created two osds and was very happy with >>>>> that but apparently not everything was correct. >>>>> >>>>> Today after a system reboot the cluster comes up and for a few >>>>> moments it seems that its ok (using the "ceph health" command) but >>>>> after a few seconds the "ceph health" command doesnt produce any >>>>> output at all. >>>>> >>>>> It justs stays there without anything on the screen... >>>>> >>>>> ceph -w is doing the same as well... >>>>> >>>>> If I restart the ceph services ("service ceph restart") again for a >>>>> few seconds is working but after a few more it stays frozen. >>>>> >>>>> Initially I thought that this was a firewall problem but apparently >>>>> it isnt. >>>>> >>>>> Then I though that this had to do with the >>>>> >>>>> public_network >>>>> >>>>> cluster_network >>>>> >>>>> not defined in ceph.conf and changed that. >>>>> >>>>> No matter whatever I do the cluster works for a few seconds after >>>>> the service restart and then it stops responding... >>>>> >>>>> Any help much appreciated!!! >>>>> >>>>> Best, >>>>> >>>>> G. >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx [1] >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] >>>> >>>> >>>> >>>> Links: >>>> ------ >>>> [1] mailto:ceph-users@xxxxxxxxxxxxxx >>>> [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> [3] mailto:giorgis@xxxxxxxxxxxx >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com