The problem we experienced was occasional packet loss (not high, only very occasional). You will see that in almost every LAN. If your ping-packet is lost and you configured a low value a brick will be offline quite fast, though there is no real problem. The bigger the timeout the more chances you have that a following ping packet will make it and reset the wait-time. On Fri, 20 Nov 2009 14:18:46 +0100 Marek <mb at kis.p.lodz.pl> wrote: > Why You suggest ping-timeout with that high value? > When some brick gets in trouble, mounted fs on client side is unusable (I/O is locked) > and have to wait 120 sec. for timeout and "release fs". > Locked client IO for 120 sec. is not acceptable. > > > regards, > > Stephan von Krawczynski wrote: > > Try setting your ping-timeout way higher, since we use 120 we have almost no > > issues in regular use. Nevertheless we do believe every problem will come back > > when some brick(s) die... > > > > > > On Tue, 10 Nov 2009 14:59:07 +0100 > > Marek Blaszkowski <mb at kis.p.lodz.pl> wrote: > > > >> OK, > >> here goes some more details, on a "bad" servers (with strange lockups) we got > >> problems with open/move files. We are unable to open,move or just ls files > >> (file utils just hangs ).... > >> > >> > >> Marek wrote: > >>> Hello, > >>> we're testing a simple configuration of glusterfs 2.0.7 with 4 servers > >>> and 1 client (2+2 bricks each replicated with > >>> a distribute translator, configs below). > >>> Durning our tests (client side copying/moving a lot of small files on > >>> glusterfs mounted FS) we got a strange > >>> lockups on two of servers (bricks). > >>> I was unable to login (via ssh) to server, on already started terminal > >>> sessions I couldn't spawn a "top" > >>> process (it just hangs), vmstats exists with floating point exception. > >>> Other fileutils commands behaves "normal". > >>> There was no any dmesg kernel messages (first guess was a kernel ups or > >>> other kernel related problems). > >>> This server never had any CPU/memory problems under high loads before. > >>> Problems starts when we > >>> run glusterfsd on this server. We had to a hard reset malfunction server > >>> (reboot doesn't work). > >>> After a couple hours of testing another server disconected from a client > >>> (according to a client debug log). > >>> Scenario was the same: > >>> 1. unable to login to a server, connection was established but sshd on > >>> server side hang/timeout after entering a user password > >>> 2. on a previous established terminal sessions was unable to run top or > >>> vmstat utility (vmstats exit with with > >>> floating point exception. Copying/moving files was OK. Load was 0.00, > >>> 0.00, 0.00 > >>> > >>> > >>> What could be wrong? These servers never had problems before (simple > >>> terminal/proxy servers). Strange locking looks > >>> like related to a kernel VM structures (why top/vmstat behaves odd??) or > >>> other kernel related problems. > >>> > >>> Server remote1 details: Linux version 2.6.26-1-686 (Debian > >>> 2.6.26-13lenny2) (dannf at debian.org) > >>> (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri > >>> Mar 13 18:08:45 UTC 2009 > >>> running debian 5.0 > >>> > >>> Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) > >>> (gcc version 4.1.3 20070929 > >>> (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007 > >>> running ubuntu > >>> both run glusterfsd: > >>> /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f > >>> /usr/local/etc/glusterfs/glusterfs-server.vol > >>> > >>> > >>> Note that both servers runs different os versions and got simillar > >>> lockup problems, never having problems > >>> before (without glusterfsd). > >>> > >>> > >>> Server gluster config file (the same on 4 servers): > >>> -----------------cut here------------------------ > >>> volume brick > >>> type storage/posix > >>> option directory /var/gluster > >>> end-volume > >>> > >>> volume locks > >>> type features/posix-locks > >>> subvolumes brick > >>> end-volume > >>> > >>> volume server > >>> type protocol/server > >>> option transport-type tcp/server > >>> option auth.ip.locks.allow * > >>> option auth.ip.brick-ns.allow * > >>> subvolumes locks > >>> end-volume > >>> -----------------cut here----------------------- > >>> > >>> client gluster config below (please note remote1 and remote4 got > >>> problems metioned above), gluster client was > >>> start with a command: > >>> glusterfs --log-file=/var/log/gluster-client -f > >>> /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest > >>> > >>> > >>> -----------------client config-cut here----------------------- > >>> volume remote1 > >>> type protocol/client > >>> option transport-type tcp/client > >>> option remote-host 192.168.2.184 > >>> option ping-timeout 5 > >>> option remote-subvolume locks > >>> end-volume > >>> > >>> volume remote2 > >>> type protocol/client > >>> option transport-type tcp/client > >>> option remote-host 192.168.2.195 > >>> option ping-timeout 5 > >>> option remote-subvolume locks > >>> end-volume > >>> > >>> volume remote3 > >>> type protocol/client > >>> option transport-type tcp/client > >>> option remote-host 192.168.2.145 > >>> option ping-timeout 5 > >>> option remote-subvolume locks > >>> end-volume > >>> > >>> volume remote4 > >>> type protocol/client > >>> option transport-type tcp/client > >>> option remote-host 192.168.2.193 > >>> option ping-timeout 5 > >>> option remote-subvolume locks > >>> end-volume > >>> > >>> volume afr1 > >>> type cluster/replicate > >>> subvolumes remote1 remote3 > >>> end-volume > >>> > >>> volume afr2 > >>> type cluster/replicate > >>> subvolumes remote2 remote4 > >>> end-volume > >>> > >>> > >>> volume bigfs > >>> type cluster/distribute > >>> subvolumes afr1 afr2 > >>> end-volume > >>> > >>> volume writebehind > >>> type performance/write-behind > >>> option flush-behind on > >>> option cache-size 3MB > >>> subvolumes bigfs > >>> end-volume > >>> > >>> volume readahead > >>> type performance/read-ahead > >>> option page-count 16 > >>> subvolumes writebehind > >>> end-volume > >>> -----------------cut here-------------------------------------- > >>> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > >> > > > > > >