Gluster 2.0.2 locking up issues

dan at lateral.net (Daniel Jordan Bambach) · Thu, 18 Jun 2009 11:05:52 +0100

I'm experiencing various locking up issues ranging from Gluster  
locking up ( 'ls'ing the mount hangs ), to the whole machine locking  
up under load.

My current config is below (two servers, afring)

I would love to be able to get to the bottom of this, because it seems  
very strange that we should see erratic behaviour on such a simple  
setup.

There is approx 12Gb of files, and to stress test (and heal) i run ls - 
alR on the mount. This will run for a while and eventually lock up  
Gluster, and occasionally the machine. I have found that in some cases  
killing Gluster and re-mounting does not solve the problem (in that  
perhaps both servers have entered a locked state in some way).

Im finding it very hard to collect and debug information of any use,  
as there is no crashlog, no errors in the volume log.
Can anyone suggest what I migth be able to do to extract more  
information as to what is occuring at lock-up time?

volume posix
  type storage/posix
  option directory /home/export
end-volume

volume locks
   type features/locks
   subvolumes posix
end-volume

volume brick
  type performance/io-threads
  subvolumes locks
  option autoscaling on
  option min-threads 8
  option max-threads 32
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.brick.allow *
  subvolumes brick
end-volume

volume latsrv2
  type protocol/client
  option transport-type tcp
  option remote-host latsrv2
  option remote-subvolume brick
end-volume

volume afr
   type cluster/replicate
   subvolumes brick latsrv2
   option read-subvolume brick
end-volume

volume writebehind
   type performance/write-behind
   option cache-size 2MB
   subvolumes afr
end-volume

volume cache
   type performance/io-cache
   option cache-size 32MB
   option priority *.pyc:4,*.html:3,*.php:2,*:1
   option cache-timeout 5
   subvolumes writebehind
end-volume