GlusterFS replication hangs (deadlock?) when 2 nodes attempts to create/delete the same file at the same time

demers.jonathan at gmail.com (Jonathan Demers) · Mon, 28 Mar 2011 13:35:35 -0400

Hi guys,

We have setup GlusterFS replication (mirror) with 2 nodes (latest version
3.1.3). Each node has the server process and the client process running. We
have stripped down the configuration to the minimum.

Client configuration (same for both nodes):

volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host glusterfs1
  option remote-subvolume brick
end-volume

volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host glusterfs2
  option remote-subvolume brick
end-volume

volume replicate
  type cluster/replicate
  subvolumes remote1 remote2
end-volume

Server configuration (same for both nodes):

volume storage
  type storage/posix
  option directory /storage
end-volume

volume brick
  type features/locks
  subvolumes storage
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.brick.allow XXX.*
  subvolumes brick
end-volume

We start everything up: GlusterFS client mounted on /mnt/gluster. We can see
the replication works fine: we can create a file in /mnt/gluster of one node
and we see it appearing in /mnt/gluster of the other node. We also see the
file appearing in the /storage of both nodes.

However, if we go on *both *nodes and run the following script in
/mnt/gluster:

while true; do touch foo; rm foo; done

GlusterFS just hangs. Every call on /mnt/gluster will just hang as well...
on both node. Even "ls -l /mnt/gluster" hangs. However, the storage
filesystem is just fine, we can do "ls -l /storage" and we see the file
"foo". "ps" shows that the script on each node is stuck in "rm foo". We
cannot stop the script, even with "ctrl-C" and "kill -9". After 30 minutes
(the default frame-timeout), GlusterFS unlocks, but the cluster is just
broken after that: file sharing does not even works (creating file on one
node and we can't see it on the other node). We can manually restart the
GlusterFS servers and clients and everything is fine after that. We can
reproduce that problem very easily with the simple script.

GlusterFS looked very promising and we planned to use it in our new HA
architecture, but the fact that a simple sequence of standard commands could
lock up the whole system is a big show stopper for us. Did you experience
that problem before? Is there a way to fix it (with configuration or other)?

Many thanks
Jonathan