Re: gluster working, but error appearing every two seconds in logs

Krishna Srinivas <krishna@xxxxxxxxxxxxx> · Sat, 31 Jan 2009 15:51:44 +0530

Jordi,

With the information you have given it is difficult to guess what
might be causing the problem. The "Connection refused" message
indicates that the server process was not running. Can you check?
About the stale mount point -  were the commands hanging when tried to
operate on the mount point?  and never returned? or were the giving
errors? like "Transport endpoint not connected?

1.3.x releases are old, you could try 2.0 rc1

Using six subvolumes for ns-afr is not a good idea as it will have an
unneeded performance hit. You could use two subvols or maybe three if
you are paranoid.

Regards
Krishna

On Fri, Jan 30, 2009 at 4:21 PM, Jordi Moles Blanco <jordi@xxxxxxxxx> wrote:
> Hello everyone,
>
> I'm using gluster 1.3.x version, patch 800 from tla repositories.
>
> The thing is that i have 6 nodes, providing a total amount of 2TB and
> several clients accessing the data constantly and putting gluster under a
> lot of work.
>
> During several days i didn't pay attention to the gluster logs, as
> everything worked fine. However, today i decided i was moving a file sized
> 500MB and the mount point got stale, i couln't access the data from that
> particular client. The gluster itself didn't seem to be affected, nodes
> didn't report any problem at all in the log files and other clients kept the
> mount point without any problem.
> Then i decided to have a look at the log files:
>
>
> *************
> 2009-01-30 11:00:41 W [client-protocol.c:332:client_protocol_xfer] espai1:
> not connected at the moment to submit frame type(1) op(15)
> 2009-01-30 11:00:41 E [client-protocol.c:3891:client_statfs_cbk] espai1: no
> proper reply from server, returning ENOTCONN
> 2009-01-30 11:00:41 E [tcp-client.c:190:tcp_connect] espai5: non-blocking
> connect() returned: 111 (Connection refused)
>
> 2009-01-30 11:00:43 W [client-protocol.c:332:client_protocol_xfer] espai2:
> not connected at the moment to submit frame type(1) op(15)
> 2009-01-30 11:00:43 E [client-protocol.c:3891:client_statfs_cbk] espai2: no
> proper reply from server, returning ENOTCONN
> 2009-01-30 11:00:43 E [tcp-client.c:190:tcp_connect] espai6: non-blocking
> connect() returned: 111 (Connection refused)
> *************
>
> This goes on and on for days, and it prints some error message every 2-3
> seconds.
>
> Is there any major bug in the version i'm using? Is there any way to fix
> this?
>
> If i look through the whole logfile, i can't see any message (a part from
> this one that repeats every 2-3 seconds) which indicates why the mountpoint
> got stale and the data not accessible from that client.
>
> Does this error message have anything to do with today's issue? Could that
> message cause a failure in the system when moving, deleting or creating
> large files?
>
> This are the config files:
>
>
>
> NODE:
>
> ***********
>
> volume esp
>       type storage/posix
>       option directory /glu0/data
> end-volume
>
> volume espai
>       type performance/io-threads
>       option thread-count 15
>       option cache-size 512MB
>       subvolumes esp
> end-volume
>
> volume nm
>       type storage/posix
>       option directory /glu0/ns
> end-volume
>
> volume ultim
>   type protocol/server
>   subvolumes espai nm
>   option transport-type tcp/server
>   option auth.ip.espai.allow *
>   option auth.ip.nm.allow *
> end-volume
>
> ***********
>
> CLIENT:
>
> ********
> volume espai1
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.3
>       option remote-subvolume espai
> end-volume
>
> volume espai2
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.4
>       option remote-subvolume espai
> end-volume
>
> volume espai3
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.5
>       option remote-subvolume espai
> end-volume
>
> volume espai4
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.6
>   option remote-subvolume espai
> end-volume
>
> volume espai5
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.7
>   option remote-subvolume espai
> end-volume
>
> volume espai6
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.8
>   option remote-subvolume espai
> end-volume
>
> volume namespace1
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.3
>       option remote-subvolume nm
> end-volume
>
> volume namespace2
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.4
>       option remote-subvolume nm
> end-volume
>
> volume namespace3
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.5
>       option remote-subvolume nm
> end-volume
>
> volume namespace4
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.6
>       option remote-subvolume nm
> end-volume
>
> volume namespace5
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.7
>       option remote-subvolume nm
> end-volume
>
> volume namespace6
>       type protocol/client
>       option transport-type tcp/client
>       option remote-host 10.0.0.8
>       option remote-subvolume nm
> end-volume
>
> volume grup1
>       type cluster/afr
>       subvolumes espai1 espai3 espai5
> end-volume
>
> volume grup2
>       type cluster/afr
>       subvolumes espai2 espai4 espai6
> end-volume
>
> volume nm
>       type cluster/afr
>       subvolumes namespace1 namespace2 namespace3 namespace4 namespace5
> namespace6
> end-volume
>
> volume g01
>       type cluster/unify
>       subvolumes grup1 grup2
>       option scheduler rr
>       option namespace nm
> end-volume
>
> volume io-cache       type performance/io-cache       option cache-size
> 512MB       option page-size 1MB
>       option force-revalidate-timeout 2       subvolumes g01
> end-volume
> ************
>
> Thanks.
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>