Hello, we're testing a simple configuration of glusterfs 2.0.7 with 4 servers and 1 client (2+2 bricks each replicated with a distribute translator, configs below). Durning our tests (client side copying/moving a lot of small files on glusterfs mounted FS) we got a strange lockups on two of servers (bricks). I was unable to login (via ssh) to server, on already started terminal sessions I couldn't spawn a "top" process (it just hangs), vmstats exists with floating point exception. Other fileutils commands behaves "normal". There was no any dmesg kernel messages (first guess was a kernel ups or other kernel related problems). This server never had any CPU/memory problems under high loads before. Problems starts when we run glusterfsd on this server. We had to a hard reset malfunction server (reboot doesn't work). After a couple hours of testing another server disconected from a client (according to a client debug log). Scenario was the same: 1. unable to login to a server, connection was established but sshd on server side hang/timeout after entering a user password 2. on a previous established terminal sessions was unable to run top or vmstat utility (vmstats exit with with floating point exception. Copying/moving files was OK. Load was 0.00, 0.00, 0.00 What could be wrong? These servers never had problems before (simple terminal/proxy servers). Strange locking looks like related to a kernel VM structures (why top/vmstat behaves odd??) or other kernel related problems. Server remote1 details: Linux version 2.6.26-1-686 (Debian 2.6.26-13lenny2) (dannf at debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri Mar 13 18:08:45 UTC 2009 running debian 5.0 Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007 running ubuntu both run glusterfsd: /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f /usr/local/etc/glusterfs/glusterfs-server.vol Note that both servers runs different os versions and got simillar lockup problems, never having problems before (without glusterfsd). Server gluster config file (the same on 4 servers): -----------------cut here------------------------ volume brick type storage/posix option directory /var/gluster end-volume volume locks type features/posix-locks subvolumes brick end-volume volume server type protocol/server option transport-type tcp/server option auth.ip.locks.allow * option auth.ip.brick-ns.allow * subvolumes locks end-volume -----------------cut here----------------------- client gluster config below (please note remote1 and remote4 got problems metioned above), gluster client was start with a command: glusterfs --log-file=/var/log/gluster-client -f /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest -----------------client config-cut here----------------------- volume remote1 type protocol/client option transport-type tcp/client option remote-host 192.168.2.184 option ping-timeout 5 option remote-subvolume locks end-volume volume remote2 type protocol/client option transport-type tcp/client option remote-host 192.168.2.195 option ping-timeout 5 option remote-subvolume locks end-volume volume remote3 type protocol/client option transport-type tcp/client option remote-host 192.168.2.145 option ping-timeout 5 option remote-subvolume locks end-volume volume remote4 type protocol/client option transport-type tcp/client option remote-host 192.168.2.193 option ping-timeout 5 option remote-subvolume locks end-volume volume afr1 type cluster/replicate subvolumes remote1 remote3 end-volume volume afr2 type cluster/replicate subvolumes remote2 remote4 end-volume volume bigfs type cluster/distribute subvolumes afr1 afr2 end-volume volume writebehind type performance/write-behind option flush-behind on option cache-size 3MB subvolumes bigfs end-volume volume readahead type performance/read-ahead option page-count 16 subvolumes writebehind end-volume -----------------cut here-------------------------------------- -- regards, Marek B.