Why You suggest ping-timeout with that high value? When some brick gets in trouble, mounted fs on client side is unusable (I/O is locked) and have to wait 120 sec. for timeout and "release fs". Locked client IO for 120 sec. is not acceptable. regards, Stephan von Krawczynski wrote: > Try setting your ping-timeout way higher, since we use 120 we have almost no > issues in regular use. Nevertheless we do believe every problem will come back > when some brick(s) die... > > > On Tue, 10 Nov 2009 14:59:07 +0100 > Marek Blaszkowski <mb at kis.p.lodz.pl> wrote: > >> OK, >> here goes some more details, on a "bad" servers (with strange lockups) we got >> problems with open/move files. We are unable to open,move or just ls files >> (file utils just hangs ).... >> >> >> Marek wrote: >>> Hello, >>> we're testing a simple configuration of glusterfs 2.0.7 with 4 servers >>> and 1 client (2+2 bricks each replicated with >>> a distribute translator, configs below). >>> Durning our tests (client side copying/moving a lot of small files on >>> glusterfs mounted FS) we got a strange >>> lockups on two of servers (bricks). >>> I was unable to login (via ssh) to server, on already started terminal >>> sessions I couldn't spawn a "top" >>> process (it just hangs), vmstats exists with floating point exception. >>> Other fileutils commands behaves "normal". >>> There was no any dmesg kernel messages (first guess was a kernel ups or >>> other kernel related problems). >>> This server never had any CPU/memory problems under high loads before. >>> Problems starts when we >>> run glusterfsd on this server. We had to a hard reset malfunction server >>> (reboot doesn't work). >>> After a couple hours of testing another server disconected from a client >>> (according to a client debug log). >>> Scenario was the same: >>> 1. unable to login to a server, connection was established but sshd on >>> server side hang/timeout after entering a user password >>> 2. on a previous established terminal sessions was unable to run top or >>> vmstat utility (vmstats exit with with >>> floating point exception. Copying/moving files was OK. Load was 0.00, >>> 0.00, 0.00 >>> >>> >>> What could be wrong? These servers never had problems before (simple >>> terminal/proxy servers). Strange locking looks >>> like related to a kernel VM structures (why top/vmstat behaves odd??) or >>> other kernel related problems. >>> >>> Server remote1 details: Linux version 2.6.26-1-686 (Debian >>> 2.6.26-13lenny2) (dannf at debian.org) >>> (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri >>> Mar 13 18:08:45 UTC 2009 >>> running debian 5.0 >>> >>> Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) >>> (gcc version 4.1.3 20070929 >>> (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007 >>> running ubuntu >>> both run glusterfsd: >>> /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f >>> /usr/local/etc/glusterfs/glusterfs-server.vol >>> >>> >>> Note that both servers runs different os versions and got simillar >>> lockup problems, never having problems >>> before (without glusterfsd). >>> >>> >>> Server gluster config file (the same on 4 servers): >>> -----------------cut here------------------------ >>> volume brick >>> type storage/posix >>> option directory /var/gluster >>> end-volume >>> >>> volume locks >>> type features/posix-locks >>> subvolumes brick >>> end-volume >>> >>> volume server >>> type protocol/server >>> option transport-type tcp/server >>> option auth.ip.locks.allow * >>> option auth.ip.brick-ns.allow * >>> subvolumes locks >>> end-volume >>> -----------------cut here----------------------- >>> >>> client gluster config below (please note remote1 and remote4 got >>> problems metioned above), gluster client was >>> start with a command: >>> glusterfs --log-file=/var/log/gluster-client -f >>> /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest >>> >>> >>> -----------------client config-cut here----------------------- >>> volume remote1 >>> type protocol/client >>> option transport-type tcp/client >>> option remote-host 192.168.2.184 >>> option ping-timeout 5 >>> option remote-subvolume locks >>> end-volume >>> >>> volume remote2 >>> type protocol/client >>> option transport-type tcp/client >>> option remote-host 192.168.2.195 >>> option ping-timeout 5 >>> option remote-subvolume locks >>> end-volume >>> >>> volume remote3 >>> type protocol/client >>> option transport-type tcp/client >>> option remote-host 192.168.2.145 >>> option ping-timeout 5 >>> option remote-subvolume locks >>> end-volume >>> >>> volume remote4 >>> type protocol/client >>> option transport-type tcp/client >>> option remote-host 192.168.2.193 >>> option ping-timeout 5 >>> option remote-subvolume locks >>> end-volume >>> >>> volume afr1 >>> type cluster/replicate >>> subvolumes remote1 remote3 >>> end-volume >>> >>> volume afr2 >>> type cluster/replicate >>> subvolumes remote2 remote4 >>> end-volume >>> >>> >>> volume bigfs >>> type cluster/distribute >>> subvolumes afr1 afr2 >>> end-volume >>> >>> volume writebehind >>> type performance/write-behind >>> option flush-behind on >>> option cache-size 3MB >>> subvolumes bigfs >>> end-volume >>> >>> volume readahead >>> type performance/read-ahead >>> option page-count 16 >>> subvolumes writebehind >>> end-volume >>> -----------------cut here-------------------------------------- >>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > >