Strange server locks isuess with 2.0.7

mb at kis.p.lodz.pl (Marek) · Tue, 10 Nov 2009 14:10:44 +0100

Hello,
we're testing a simple configuration of glusterfs 2.0.7 with 4 servers and 1 client (2+2 bricks each replicated with
a distribute translator, configs below).
Durning our tests (client side copying/moving a lot of small files on glusterfs mounted FS) we got a strange
lockups on two of servers (bricks).
I was unable to login (via ssh) to server, on already started terminal sessions I couldn't spawn a "top"
process (it just hangs), vmstats exists with floating point exception. Other fileutils commands behaves "normal".
There was no any dmesg kernel messages (first guess was a kernel ups or other kernel related problems).
This server never had any CPU/memory problems under high loads before. Problems starts when we
run glusterfsd on this server. We had to a hard reset malfunction server (reboot doesn't work).
After a couple hours of testing another server disconected from a client (according to a client debug log).
Scenario was the same:
1. unable to login to a server, connection was established but sshd on server side hang/timeout after entering a user password
2. on a previous established terminal sessions was unable to run top or vmstat utility (vmstats exit with with
floating point exception. Copying/moving files was OK. Load was  0.00, 0.00, 0.00

What could be wrong? These servers never had problems before (simple terminal/proxy servers). Strange locking looks
like related to a kernel VM structures (why top/vmstat behaves odd??) or other kernel related problems.

Server remote1 details: Linux version 2.6.26-1-686 (Debian 2.6.26-13lenny2) (dannf at debian.org)
(gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri Mar 13 18:08:45 UTC 2009
running debian 5.0

Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) (gcc version 4.1.3 20070929
(prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007
running ubuntu
both run glusterfsd:
  /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f /usr/local/etc/glusterfs/glusterfs-server.vol

Note that both servers runs different os versions and got simillar lockup problems, never having problems
before (without glusterfsd).

Server gluster config file (the same on 4 servers):
-----------------cut here------------------------
volume brick
type storage/posix
option directory /var/gluster
end-volume

volume locks
type features/posix-locks
subvolumes brick
end-volume

volume server
type protocol/server
option transport-type tcp/server
option auth.ip.locks.allow *
option auth.ip.brick-ns.allow *
subvolumes locks
end-volume
-----------------cut here-----------------------

client gluster config below (please note remote1 and remote4 got problems metioned above), gluster client was
start with a command:
glusterfs --log-file=/var/log/gluster-client -f /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest

-----------------client config-cut here-----------------------
volume remote1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.2.184
option ping-timeout 5
option remote-subvolume locks
end-volume

volume remote2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.2.195
option ping-timeout 5
option remote-subvolume locks
end-volume

volume remote3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.2.145
option ping-timeout 5
option remote-subvolume locks
end-volume

volume remote4
type protocol/client
option transport-type tcp/client
option remote-host 192.168.2.193
option ping-timeout 5
option remote-subvolume locks
end-volume

volume afr1
type cluster/replicate
subvolumes remote1 remote3
end-volume

volume afr2
type cluster/replicate
subvolumes remote2 remote4
end-volume

volume bigfs
type cluster/distribute
subvolumes afr1 afr2
end-volume

volume writebehind
type performance/write-behind
option flush-behind on
option cache-size 3MB
subvolumes bigfs
end-volume

volume readahead
type performance/read-ahead
option page-count 16
subvolumes writebehind
end-volume
-----------------cut here--------------------------------------

-- 
regards,
Marek B.