Strange server locks isuess with 2.0.7 - updating

mb at kis.p.lodz.pl (Marek) · Fri, 20 Nov 2009 14:18:46 +0100

Why You suggest ping-timeout with that high value?
When some brick gets in trouble, mounted fs on client side is unusable (I/O is locked)
and have to wait 120 sec. for timeout and "release fs".
Locked client IO for 120 sec. is not acceptable.

regards,

Stephan von Krawczynski wrote:
> Try setting your ping-timeout way higher, since we use 120 we have almost no
> issues in regular use. Nevertheless we do believe every problem will come back
> when some brick(s) die...
> 
> 
> On Tue, 10 Nov 2009 14:59:07 +0100
> Marek Blaszkowski <mb at kis.p.lodz.pl> wrote:
> 
>> OK,
>> here goes some more details, on a "bad" servers (with strange lockups) we got
>> problems with open/move files. We are unable to open,move or just ls files
>> (file utils just hangs )....
>>
>>
>> Marek wrote:
>>> Hello,
>>> we're testing a simple configuration of glusterfs 2.0.7 with 4 servers 
>>> and 1 client (2+2 bricks each replicated with
>>> a distribute translator, configs below).
>>> Durning our tests (client side copying/moving a lot of small files on 
>>> glusterfs mounted FS) we got a strange
>>> lockups on two of servers (bricks).
>>> I was unable to login (via ssh) to server, on already started terminal 
>>> sessions I couldn't spawn a "top"
>>> process (it just hangs), vmstats exists with floating point exception. 
>>> Other fileutils commands behaves "normal".
>>> There was no any dmesg kernel messages (first guess was a kernel ups or 
>>> other kernel related problems).
>>> This server never had any CPU/memory problems under high loads before. 
>>> Problems starts when we
>>> run glusterfsd on this server. We had to a hard reset malfunction server 
>>> (reboot doesn't work).
>>> After a couple hours of testing another server disconected from a client 
>>> (according to a client debug log).
>>> Scenario was the same:
>>> 1. unable to login to a server, connection was established but sshd on 
>>> server side hang/timeout after entering a user password
>>> 2. on a previous established terminal sessions was unable to run top or 
>>> vmstat utility (vmstats exit with with
>>> floating point exception. Copying/moving files was OK. Load was  0.00, 
>>> 0.00, 0.00
>>>
>>>
>>> What could be wrong? These servers never had problems before (simple 
>>> terminal/proxy servers). Strange locking looks
>>> like related to a kernel VM structures (why top/vmstat behaves odd??) or 
>>> other kernel related problems.
>>>
>>> Server remote1 details: Linux version 2.6.26-1-686 (Debian 
>>> 2.6.26-13lenny2) (dannf at debian.org)
>>> (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri 
>>> Mar 13 18:08:45 UTC 2009
>>> running debian 5.0
>>>
>>> Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) 
>>> (gcc version 4.1.3 20070929
>>> (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007
>>> running ubuntu
>>> both run glusterfsd:
>>>  /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f 
>>> /usr/local/etc/glusterfs/glusterfs-server.vol
>>>
>>>
>>> Note that both servers runs different os versions and got simillar 
>>> lockup problems, never having problems
>>> before (without glusterfsd).
>>>
>>>
>>> Server gluster config file (the same on 4 servers):
>>> -----------------cut here------------------------
>>> volume brick
>>> type storage/posix
>>> option directory /var/gluster
>>> end-volume
>>>
>>> volume locks
>>> type features/posix-locks
>>> subvolumes brick
>>> end-volume
>>>
>>> volume server
>>> type protocol/server
>>> option transport-type tcp/server
>>> option auth.ip.locks.allow *
>>> option auth.ip.brick-ns.allow *
>>> subvolumes locks
>>> end-volume
>>> -----------------cut here-----------------------
>>>
>>> client gluster config below (please note remote1 and remote4 got 
>>> problems metioned above), gluster client was
>>> start with a command:
>>> glusterfs --log-file=/var/log/gluster-client -f 
>>> /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest
>>>
>>>
>>> -----------------client config-cut here-----------------------
>>> volume remote1
>>> type protocol/client
>>> option transport-type tcp/client
>>> option remote-host 192.168.2.184
>>> option ping-timeout 5
>>> option remote-subvolume locks
>>> end-volume
>>>
>>> volume remote2
>>> type protocol/client
>>> option transport-type tcp/client
>>> option remote-host 192.168.2.195
>>> option ping-timeout 5
>>> option remote-subvolume locks
>>> end-volume
>>>
>>> volume remote3
>>> type protocol/client
>>> option transport-type tcp/client
>>> option remote-host 192.168.2.145
>>> option ping-timeout 5
>>> option remote-subvolume locks
>>> end-volume
>>>
>>> volume remote4
>>> type protocol/client
>>> option transport-type tcp/client
>>> option remote-host 192.168.2.193
>>> option ping-timeout 5
>>> option remote-subvolume locks
>>> end-volume
>>>
>>> volume afr1
>>> type cluster/replicate
>>> subvolumes remote1 remote3
>>> end-volume
>>>
>>> volume afr2
>>> type cluster/replicate
>>> subvolumes remote2 remote4
>>> end-volume
>>>
>>>
>>> volume bigfs
>>> type cluster/distribute
>>> subvolumes afr1 afr2
>>> end-volume
>>>
>>> volume writebehind
>>> type performance/write-behind
>>> option flush-behind on
>>> option cache-size 3MB
>>> subvolumes bigfs
>>> end-volume
>>>
>>> volume readahead
>>> type performance/read-ahead
>>> option page-count 16
>>> subvolumes writebehind
>>> end-volume
>>> -----------------cut here--------------------------------------
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> 
>