OK, here goes some more details, on a "bad" servers (with strange lockups) we got problems with open/move files. We are unable to open,move or just ls files (file utils just hangs ).... Marek wrote: > Hello, > we're testing a simple configuration of glusterfs 2.0.7 with 4 servers > and 1 client (2+2 bricks each replicated with > a distribute translator, configs below). > Durning our tests (client side copying/moving a lot of small files on > glusterfs mounted FS) we got a strange > lockups on two of servers (bricks). > I was unable to login (via ssh) to server, on already started terminal > sessions I couldn't spawn a "top" > process (it just hangs), vmstats exists with floating point exception. > Other fileutils commands behaves "normal". > There was no any dmesg kernel messages (first guess was a kernel ups or > other kernel related problems). > This server never had any CPU/memory problems under high loads before. > Problems starts when we > run glusterfsd on this server. We had to a hard reset malfunction server > (reboot doesn't work). > After a couple hours of testing another server disconected from a client > (according to a client debug log). > Scenario was the same: > 1. unable to login to a server, connection was established but sshd on > server side hang/timeout after entering a user password > 2. on a previous established terminal sessions was unable to run top or > vmstat utility (vmstats exit with with > floating point exception. Copying/moving files was OK. Load was 0.00, > 0.00, 0.00 > > > What could be wrong? These servers never had problems before (simple > terminal/proxy servers). Strange locking looks > like related to a kernel VM structures (why top/vmstat behaves odd??) or > other kernel related problems. > > Server remote1 details: Linux version 2.6.26-1-686 (Debian > 2.6.26-13lenny2) (dannf at debian.org) > (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri > Mar 13 18:08:45 UTC 2009 > running debian 5.0 > > Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) > (gcc version 4.1.3 20070929 > (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007 > running ubuntu > both run glusterfsd: > /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f > /usr/local/etc/glusterfs/glusterfs-server.vol > > > Note that both servers runs different os versions and got simillar > lockup problems, never having problems > before (without glusterfsd). > > > Server gluster config file (the same on 4 servers): > -----------------cut here------------------------ > volume brick > type storage/posix > option directory /var/gluster > end-volume > > volume locks > type features/posix-locks > subvolumes brick > end-volume > > volume server > type protocol/server > option transport-type tcp/server > option auth.ip.locks.allow * > option auth.ip.brick-ns.allow * > subvolumes locks > end-volume > -----------------cut here----------------------- > > client gluster config below (please note remote1 and remote4 got > problems metioned above), gluster client was > start with a command: > glusterfs --log-file=/var/log/gluster-client -f > /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest > > > -----------------client config-cut here----------------------- > volume remote1 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.2.184 > option ping-timeout 5 > option remote-subvolume locks > end-volume > > volume remote2 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.2.195 > option ping-timeout 5 > option remote-subvolume locks > end-volume > > volume remote3 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.2.145 > option ping-timeout 5 > option remote-subvolume locks > end-volume > > volume remote4 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.2.193 > option ping-timeout 5 > option remote-subvolume locks > end-volume > > volume afr1 > type cluster/replicate > subvolumes remote1 remote3 > end-volume > > volume afr2 > type cluster/replicate > subvolumes remote2 remote4 > end-volume > > > volume bigfs > type cluster/distribute > subvolumes afr1 afr2 > end-volume > > volume writebehind > type performance/write-behind > option flush-behind on > option cache-size 3MB > subvolumes bigfs > end-volume > > volume readahead > type performance/read-ahead > option page-count 16 > subvolumes writebehind > end-volume > -----------------cut here-------------------------------------- >