Actually there are two monitors (my bad in the previous e-mail).
One at the MASTER and one at the CLIENT.
The monitor in CLIENT is failing with the following
2014-03-05 13:08:38.821135 7f76ba82b700 1
mon.client1@0(leader).paxos(paxos active c 25603..26314) is_readable
now=2014-03-05 13:08:38.821136 lease_expire=2014-03-05 13:08:40.845978
has v0 lc 26314
2014-03-05 13:08:40.599287 7f76bb22c700 0
mon.client1@0(leader).data_health(86) update_stats avail 4% total
51606140 used 46645692 avail 2339008
2014-03-05 13:08:40.599527 7f76bb22c700 -1
mon.client1@0(leader).data_health(86) reached critical levels of
available space on data store -- shutdown!
2014-03-05 13:08:40.599530 7f76bb22c700 0 ** Shutdown via Data Health
Service **
2014-03-05 13:08:40.599557 7f76b9328700 -1 mon.client1@0(leader) e2 ***
Got Signal Interrupt ***
2014-03-05 13:08:40.599568 7f76b9328700 1 mon.client1@0(leader) e2
shutdown
2014-03-05 13:08:40.599602 7f76b9328700 0 quorum service shutdown
2014-03-05 13:08:40.599609 7f76b9328700 0
mon.client1@0(shutdown).health(86) HealthMonitor::service_shutdown 1
services
2014-03-05 13:08:40.599613 7f76b9328700 0 quorum service shutdown
The thing is that there is plenty of space in that host (CLIENT)
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_one-lv_root 50G 45G 2.3G 96% /
tmpfs 5.9G 0 5.9G 0% /dev/shm
/dev/sda1 485M 76M 384M 17% /boot
/dev/mapper/vg_one-lv_home 862G 249G 569G 31% /home
On the other hand the other host (MASTER) is running low on disk space
(93% is full).
But why is the CLIENT failing while the MASTER is still running even
though is running low on disk space?
I 'll try to free some space and see what happens next...
Best,
G.
On Wed, 05 Mar 2014 11:50:57 +0100, Wido den Hollander wrote:
On 03/05/2014 11:21 AM, Georgios Dimitrakakis wrote:
My setup consists of two nodes.
The first node (master) is running:
-mds
-mon
-osd.0
and the second node (CLIENT) is running:
-osd.1
Therefore I 've restarted ceph services on both nodes
Leaving the "ceph -w" running for as long as it can after a few
seconds
the error that is produced is this:
2014-03-05 12:08:17.715699 7fba13fff700 0 monclient: hunting for
new mon
2014-03-05 12:08:17.716108 7fba102f8700 0 -- 192.168.0.10:0/1008298
>>
X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7fba080090b0).fault
(where X.Y.Z.X is the public IP of the CLIENT node).
And it keep goes on...
"ceph-health" after a few minutes shows the following
2014-03-05 12:12:58.355677 7effc52fb700 0 monclient(hunting):
authenticate timed out after 300
2014-03-05 12:12:58.355717 7effc52fb700 0 librados: client.admin
authentication error (110) Connection timed out
Error connecting to cluster: TimedOut
Any ideas now??
Is the monitor actually running on the first node? If not, checked
the logs in /var/log/ceph as to why it isn't running.
Or maybe you just need to start it.
Wido
Best,
G.
On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote:
First try to start OSD nodes by restarting the ceph service on ceph
nodes. If it works file then you could able to see ceph-osd process
running in process list. And do not need to add any public or
private
network in ceph.conf. If none of the OSDs run then you need to
reconfigure them from monitor node.
Please check ceph-mon process is running on monitor node or not?
ceph-mds should not run.
also check /etc/hosts file with valid ip address of cluster nodes
Finally check ceph.client.admin.keyring and
ceph.bootstrap-osd.keyring
should be matched in all the cluster nodes.
Best of luck.
Srinivas.
On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis wrote:
Hi!
I have installed ceph and created two osds and was very happy with
that but apparently not everything was correct.
Today after a system reboot the cluster comes up and for a few
moments it seems that its ok (using the "ceph health" command) but
after a few seconds the "ceph health" command doesnt produce any
output at all.
It justs stays there without anything on the screen...
ceph -w is doing the same as well...
If I restart the ceph services ("service ceph restart") again for
a
few seconds is working but after a few more it stays frozen.
Initially I thought that this was a firewall problem but
apparently
it isnt.
Then I though that this had to do with the
public_network
cluster_network
not defined in ceph.conf and changed that.
No matter whatever I do the cluster works for a few seconds after
the service restart and then it stops responding...
Any help much appreciated!!!
Best,
G.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]
Links:
------
[1] mailto:ceph-users@xxxxxxxxxxxxxx
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:giorgis@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com