Re: gluster working, but error appearing every two seconds in logs - NEW INFO

Jordi Moles Blanco <jordi@xxxxxxxxx> · Thu, 19 Feb 2009 18:22:45 +0100

En/na Jordi Moles Blanco ha escrit:
En/na Anand Avati ha escrit:
During several days i didn't pay attention to the gluster logs, as
everything worked fine. However, today i decided i was moving a file 
sized
500MB and the mount point got stale, i couln't access the data from 
that
particular client. The gluster itself didn't seem to be affected, nodes
didn't report any problem at all in the log files and other clients 
kept the
mount point without any problem.
Then i decided to have a look at the log files:

*************
2009-01-30 11:00:41 W [client-protocol.c:332:client_protocol_xfer] 
espai1:
not connected at the moment to submit frame type(1) op(15)
2009-01-30 11:00:41 E [client-protocol.c:3891:client_statfs_cbk] 
espai1: no
proper reply from server, returning ENOTCONN
2009-01-30 11:00:41 E [tcp-client.c:190:tcp_connect] espai5: 
non-blocking
connect() returned: 111 (Connection refused)

2009-01-30 11:00:43 W [client-protocol.c:332:client_protocol_xfer] 
espai2:
not connected at the moment to submit frame type(1) op(15)
2009-01-30 11:00:43 E [client-protocol.c:3891:client_statfs_cbk] 
espai2: no
proper reply from server, returning ENOTCONN
2009-01-30 11:00:43 E [tcp-client.c:190:tcp_connect] espai6: 
non-blocking
connect() returned: 111 (Connection refused)
*************

A connection refused error is got when a daemon is not running, or if
there is a packet filter resetting connections. If GlusterFS daemon is
running and other clients are able to access normally, please make
sure there is no packet filtering of some sort happening. You can try
flushing all firewall rules if there were any. Based on the
description you give, it seems to be an issue outside GlusterFS

Avati

Hi,

thanks for the explanation about the origin of the error message.

Well... it doesn't look like there is a problem with the network on 
which glusterfs runs, it would have appeared in the rrd graphs i'm 
keeping for net traffic, but i'll carry a whole test to see if there's 
the slightest problem which could generate this message.

Thanks.

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel

Hi,

since the last time we were in contact I've been trying to track down 
where the problem is. I've been monitoring almost every possible thing 
related to network traffic, and... eventually.... i found out what the 
problem is by chance!!

It turns out that when in a client-server mounting gluster i run "df 
-h", i get this:

***********
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai1: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai1: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai1: 
no proper reply from server, returning ENOTCONN
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai5: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai5: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai5: 
no proper reply from server, returning ENOTCONN
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai2: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai2: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai2: 
no proper reply from server, returning ENOTCONN
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai6: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai6: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai6: 
no proper reply from server, returning ENOTCONN

************

so... the reason why it is appearing so often is that i've got munin 
monitoring this gluster environment, and it performs a "df" command to 
check the disk space of all the servers, including, of course, the 
gluster mount point. When this happens... the error log shown above 
these lines is reported and eventually.... the mount point in that 
server fails. No data is lost, but i have to remount glusterfs as it 
becomes stale and data is not accessible.

is this a normal behaviour?

i could stop munin from running "df" every 5 minutes... but still... is 
there any problem in my setup or is this what gluster is supposed to do?

Thanks.