Hi, I'll actually add some info here I keep forgetting. I heal (and sync) on a separate interface, so: eth0: clients mount NFS eth1: heal and sync eth2: management I don't think it's a 'supported' configuration, but it works very well. Gerald ----- Original Message ----- > From: "Gerald Brandt" <gbr at majentis.com> > To: "Joe Julian" <joe at julianfamily.org> > Cc: gluster-users at gluster.org > Sent: Thursday, November 29, 2012 6:26:46 AM > Subject: Re: Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write > timeout) > > How about an option to throttle/limit the self heal speed? DRBD has > a speed limit, which very effectively cuts down on the resources > needed. > > That being said, I have not had a problem with self heal on my VM > images. Just two days ago, I deleted all images from one brick and > let the self heal put everything back, rebuilding the entire brick > while VM's were running, during business hours (Disk failure force > me to do it). > > Gerald > > ----- Original Message ----- > > From: "Joe Julian" <joe at julianfamily.org> > > To: gluster-users at gluster.org > > Sent: Thursday, November 29, 2012 12:37:37 AM > > Subject: Re: Self healing of 3.3.0 cause our 2 > > bricks replicated cluster freeze (client read/write > > timeout) > > > > Ok listen up everybody. What you're experiencing is not that self > > heal > > is a blocking operation. You're either running out of bandwidth, > > processor, bus... Whatever it is, it's not that. > > > > That was fixed in commit 1af420c700fbc49b65cf7faceb3270e81cd991ce. > > > > So please, get it out of your head that this is just that the > > feature > > was never added. It was. It's been tested successfully by many > > admins > > on > > many different systems. > > > > Once it's out of your head that it's a missing feature, PLEASE try > > to > > figure out why YOUR system is showing the behavior that you're > > experiencing. I can't do it. It's not failing for me. Then file a > > bug > > report explaining it so these very intelligent guys can figure out > > a > > solution. I've seen how that works. When Avati sees a problem, > > he'll > > be > > sitting on the floor in a hallway because it has WiFi and an outlet > > and > > he won't even notice that everyone else has gone to lunch, come > > back, > > gone to several panels, come back again, and that the expo hall is > > starting to clear out because the place is closing. He's focused > > and > > dedicated. All these guys are very talented and understand this > > stuff > > better than I ever can. They will fix the bug if it can be > > identified. > > > > The first step is finding the actual problem instead of pointing to > > something that you're just guessing isn't there. > > > > On 11/28/2012 09:24 PM, ZHANG Cheng wrote: > > > I dig out an gluster-users m-list thread dated 2011-June at > > > http://gluster.org/pipermail/gluster-users/2011-June/008111.html. > > > > > > In this post, Marco Agostini said: > > > ================================================== > > > Craig Carl said me, three days ago: > > > ------------------------------------------------------ > > > that happens because Gluster's self heal is a blocking > > > operation. > > > We > > > are working on a non-blocking self heal, we are hoping to ship it > > > in > > > early September. > > > ------------------------------------------------------ > > > ================================================== > > > > > > Looks like even with release of 3.3.1, self heal is still a > > > blocking > > > operation. I am wondering why the official Administration Guide > > > doesn't warn the reader about such important thing regarding > > > production operation. > > > > > > > > > On Mon, Nov 26, 2012 at 5:46 PM, ZHANG Cheng > > > <czhang.oss at gmail.com> > > > wrote: > > >> Early this morning our 2 bricks replicated cluster had an > > >> outage. > > >> The > > >> disk space for one of the brick server (brick02) was used up. > > >> When > > >> we > > >> responded to the disk full alert, the issue already lasted for a > > >> few > > >> hours. We reclaimed some disk space, and reboot the brick02 > > >> server, > > >> expecting once it come back it will go self healing. > > >> > > >> It did go self healing, but just after couple minutes, access to > > >> gluster filesystem freeze. Tons of "nfs: server brick not > > >> responding, > > >> still trying" popped up in dmesg. The load average on app server > > >> went > > >> up to 200 something from usual 0.10. We had to shutdown brick02 > > >> server > > >> or stop gluster server process on it, to get the gluster cluster > > >> back > > >> working. > > >> > > >> How could we deal with this issue? Thanks in advance. > > >> > > >> Our gluster setup is followed the official doc. > > >> > > >> gluster> volume info > > >> > > >> Volume Name: staticvol > > >> Type: Replicate > > >> Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715 > > >> Status: Started > > >> Number of Bricks: 1 x 2 = 2 > > >> Transport-type: tcp > > >> Bricks: > > >> Brick1: brick01:/exports/static > > >> Brick2: brick02:/exports/static > > >> > > >> Underlying filesystem is xfs (on a lvm volume), as: > > >> /dev/mapper/vg_node-brick on /exports/static type xfs > > >> (rw,noatime,nodiratime,nobarrier,logbufs=8) > > >> > > >> The brick servers don't act as gluster client. > > >> > > >> Our app servers are the gluster client, mount via nfs. > > >> brick:/staticvol on /mnt/gfs-static type nfs > > >> (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51) > > >> > > >> brick is a DNS round-robin record for brick01 and brick02. > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >