I dig out an gluster-users m-list thread dated 2011-June at http://gluster.org/pipermail/gluster-users/2011-June/008111.html. In this post, Marco Agostini said: ================================================== Craig Carl said me, three days ago: ------------------------------------------------------ that happens because Gluster's self heal is a blocking operation. We are working on a non-blocking self heal, we are hoping to ship it in early September. ------------------------------------------------------ ================================================== Looks like even with release of 3.3.1, self heal is still a blocking operation. I am wondering why the official Administration Guide doesn't warn the reader about such important thing regarding production operation. On Mon, Nov 26, 2012 at 5:46 PM, ZHANG Cheng <czhang.oss at gmail.com> wrote: > Early this morning our 2 bricks replicated cluster had an outage. The > disk space for one of the brick server (brick02) was used up. When we > responded to the disk full alert, the issue already lasted for a few > hours. We reclaimed some disk space, and reboot the brick02 server, > expecting once it come back it will go self healing. > > It did go self healing, but just after couple minutes, access to > gluster filesystem freeze. Tons of "nfs: server brick not responding, > still trying" popped up in dmesg. The load average on app server went > up to 200 something from usual 0.10. We had to shutdown brick02 server > or stop gluster server process on it, to get the gluster cluster back > working. > > How could we deal with this issue? Thanks in advance. > > Our gluster setup is followed the official doc. > > gluster> volume info > > Volume Name: staticvol > Type: Replicate > Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: brick01:/exports/static > Brick2: brick02:/exports/static > > Underlying filesystem is xfs (on a lvm volume), as: > /dev/mapper/vg_node-brick on /exports/static type xfs > (rw,noatime,nodiratime,nobarrier,logbufs=8) > > The brick servers don't act as gluster client. > > Our app servers are the gluster client, mount via nfs. > brick:/staticvol on /mnt/gfs-static type nfs > (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51) > > brick is a DNS round-robin record for brick01 and brick02.