Hi Dan, Thank you a lot for your comprehensive explantion of using rsync to sync glusterfs servers. I have not a opportunity to check that solution because my customer decided to give up of Glusters. I will test it at my lab. Thanks, Jimmy, On 16 May 2012 16:45, Dan Bretherton <d.a.bretherton at reading.ac.uk> wrote: > Hi Glusterfs Users! >> >> I have got one replicated volume with two bricks: >> >> s1 ~ # gluster volume info >> >> Volume Name: data-ns >> Type: Replicate >> Status: Started >> Number of Bricks: 2 >> Transport-type: tcp >> Bricks: >> Brick1: s1:/mnt/gluster/data-ns >> Brick2: s2:/mnt/gluster/data-ns >> Options Reconfigured: >> performance.cache-refresh-**timeout: 1 >> performance.io-thread-count: 32 >> auth.allow: 10.* >> performance.cache-size: 1073741824 >> >> >> There are 5 clients which have got mounted volume from s1 server. >> >> We've face a hardware failure on s2 box for about one week. During that >> time the s2 box was down. >> All read writes operations went to s1. >> Now I would like to synchronize all files on s2 which is operable. I have >> started Glusterfs Server and >> executed self healing process("find with stat"on the glusterfs mount from >> s2 box). >> During the replication process I have faced very strange behaviour of >> Glusterfs. >> Some of clients have tried to get lots of files from s2 server, but those >> files did not exist or have got 0 bytes size. >> >> It caused lots of "disk wait" on the web servers (clients which have got >> mounted volume from s1) and finally 503 http response had been sent. >> >> My question is, how to avoid serving files from s2 box until all files >> would be replicated correctly from s1 server? >> >> I have installed Glusters 3.2.6-1 from Debian repository. >> >> Thank you a lot in advance, >> Jimmy, >> > > Dear Jimmy, > I have had problems re-synchronising out of date servers myself. I posted > the following query last year. > > http://gluster.org/pipermail/**gluster-users/2011-October/**008933.html<http://gluster.org/pipermail/gluster-users/2011-October/008933.html> > > In my case I was mainly worried about the self-heal process causing > excessive load, which I suspected of causing my fairly low specification > servers to hang. Following that posting I received some advice off line > concerning the use of rsync to re-synchronise out of date servers that have > been off line for repairs for a long period of time. I was advised that it > is safe to use rsync, provided that the -X or --xattrs option is used to > preserve extended attributes, and it is also necessary to use the --delete > option in order to delete files that were deleted from the live server. > When I do this I disable the glusterd service while the rsync is taking > place, although I have not been advised that this is essential. It is > possible that files on the live server may be modified while the rsync is > in process, so I always follow up with a targeted self-heal in order to > bring the repaired server fully up to date. The targeted self-heal > procedure is described in the following Gluster Community article. > > http://community.gluster.org/**a/howto-targeted-self-heal-** > repairing-less-than-the-whole-**volume/<http://community.gluster.org/a/howto-targeted-self-heal-repairing-less-than-the-whole-volume/> > > When the resynchronisation process is complete I have noticed that the > volume of data in replicated bricks can differ by up to 100MB. I find this > a bit worrying, but I haven't had time to find out exactly which files are > on these bricks and why the volume of data reported by df differs on the > two servers. > > The problem with the rsync approach is that it can take a very long time > if there are a large number of files to synchronise, probably because rsync > is single threaded. I recently had one rsync going for two weeks and it > still didn't finish, and I discovered that the bricks in question had more > than 2.5 million files. I couldn't wait any longer to bring my repaired > server back into service so I killed the rsync and started glusterd, and I > then ran a targeted self-heal on the unsynchronised bricks to continue the > resynchronisation. That is still going on now, but I am not seeing > excessive load and haven't noticed any replication errors (but I haven't > got the time to check thoroughly). This might be because most of the file > transfer has already taken place or because most of the files in these > particular bricks are small. > > My conclusion from this experience is that if a server goes down for a > long time and becomes significantly out of date, it is best to use rsync > (with glusterd disabled) to do as much of the file transfer as possible. > Once that has been done, the GlusterFS self heal mechanism can finish off > the resynchronisation without any problematic side effects. I will follow > that procedure next time and report any other problems or observations. > > -Dan. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120518/306fe973/attachment.htm>