How to re-sync

ccolumbu at hotmail.com (Chad) · Mon, 08 Mar 2010 01:52:08 -0800

That seems correct with 1 change, not only do I get the old file in step 5, that old file overwrites the newer file on the node that did not go down.

 > 1) What versions are you using ?
glusterfs 3.0.2 built on Feb  7 2010 00:15:44
Repository revision: v3.0.2

 > 2) Can you share your volume files ? Are they generated using volgen ?
I did generate them via volgen, but then modified them because I have 3 shares, but only to rename things.
(vol files at end of e-mail)

 > 3) Did you notice any patterns for the files where the wrong copy was picked ? like
 > were they open when the node was brought down ?
I was not monitoring this.

 > 4) Any other way to reproduce the problem ?
See my nfs issue below, although I don't think they are related.

 > 5) Any other patterns you observed when you see the problem ?
See my nfs issue below, although I don't think they are related.

 > 6) Would you have listings of problem file(s) from the replica nodes ?
No.

Also I did something today that works on nfs but does not work in gluster.
I have a share mounted on /cs_data.
I have directories in that share /cs_data/web and /cs_data/home
I move the /cs_data/web into /cs_data/home (so I get: /cs_data/home/web) then symlink /cs_data/web to /cs_data/home/web, like this:
cd /cs_data;
mv web home;
ln -s home/web

On all the clients /cs_data/web does not work anymore.
If I unmount and remount it works again.
Unfortunately for the unmount/mount to work I have to kill things like httpd.
So to do a simple dir move (because I had it in the wrong place) on a read-only dir, I have to kill my service.

I have done exactly this with an nfs mount and it did not fail at all, I did not have to kill httpd and I did not have to unmount/remount the share.

------------------
--- server.vol ---
------------------
# $ /usr/bin/glusterfs-volgen -n tcb_data -p 50001 -r 1 -c /etc/glusterfs 10.0.0.24:/mnt/tcb_data 10.0.0.25:/mnt/tcb_data

######################################
# Start tcb share
######################################
volume tcb_posix
   type storage/posix
   option directory /mnt/tcb_data
end-volume

volume tcb_locks
     type features/locks
     subvolumes tcb_posix
end-volume

volume tcb_brick
     type performance/io-threads
     option thread-count 8
     subvolumes tcb_locks
end-volume

volume tcb_server
     type protocol/server
     option transport-type tcp
     option auth.addr.tcb_brick.allow *
     option transport.socket.listen-port 50001
     option transport.socket.nodelay on
     subvolumes tcb_brick
end-volume

------------------
--- tcb client.vol ---
------------------
volume tcb_remote_glust1
         type protocol/client
         option transport-type tcp
         option ping-timeout 5
         option remote-host 10.0.0.24
         option transport.socket.nodelay on
         option transport.remote-port 50001
         option remote-subvolume tcb_brick
end-volume

volume tcb_remote_glust2
         type protocol/client
         option transport-type tcp
         option ping-timeout 5
         option remote-host 10.0.0.25
         option transport.socket.nodelay on
         option transport.remote-port 50001
         option remote-subvolume tcb_brick
end-volume

volume tcb_mirror
         type cluster/replicate
         subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume

volume tcb_writebehind
         type performance/write-behind
         option cache-size 4MB
         subvolumes tcb_mirror
end-volume

volume tcb_readahead
         type performance/read-ahead
         option page-count 4
         subvolumes tcb_writebehind
end-volume

volume tcb_iocache
         type performance/io-cache
         option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB
         option cache-timeout 1
         subvolumes tcb_readahead
end-volume

volume tcb_quickread
         type performance/quick-read
         option cache-timeout 1
         option max-file-size 64kB
         subvolumes tcb_iocache
end-volume

volume tcb_statprefetch
         type performance/stat-prefetch
         subvolumes tcb_quickread
end-volume

^C

Tejas N. Bhise wrote:
> Chad, Stephan - thank you for your feedback.
> 
> Just to clarify on what wrote, do you mean to say that -
> 
> 1) The setup is a replicate setup with the file being written to multiple nodes.
> 2) One of these nodes is brought down.
> 3) A replicated file with a copy on the node brought down is written to.
> 4) The other copies are updates as writes  happen while this node is still down.
> 5) After this node is brought up, the client sometimes sees the old file on the node brought up
> instead of picking the file from a node that has the latest copy.
> 
> If the above is correct, quick questions -
> 
> 1) What versions are you using ?
> 2) Can you share your volume files ? Are they generated using volgen ? 
> 3) Did you notice any patterns for the files where the wrong copy was picked ? like 
> were they open when the node was brought down ?
> 4) Any other way to reproduce the problem ?
> 5) Any other patterns you observed when you see the problem ?
> 6) Would you have listings of problem file(s) from the replica nodes ?
> 
> If however my understanding was not  correct, then please let me know with some
> examples.
> 
> Regards,
> Tejas.
> 
> ----- Original Message -----
> From: "Chad" <ccolumbu at hotmail.com>
> To: "Stephan von Krawczynski" <skraw at ithnet.com>
> Cc: gluster-users at gluster.org
> Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
> Subject: Re: How to re-sync
> 
> I actually do prefer top post.
> 
> Well this "overwritten" behavior is what I saw as well and that is a REALLY REALLY bad thing.
> Which is why I asked my question in the first place.
> 
> Is there a gluster developer out there working on this problem specifically?
> Could we add some kind of "sync done" command that has to be run manually and until it is the failed node is not used?
> The bottom line for me is that I would much rather run on a performance degraded array until a sysadmin intervenes, than loose any data.
> 
> ^C
> 
> 
> 
> Stephan von Krawczynski wrote:
>> I love top-post ;-)
>>
>> Generally, you are right. But in real-life you cannot trust on this
>> "smartness". We tried exactly this point and had to find out that the clients
>> do not always select the correct file version (i.e. the latest) automatically.
>> Our idea in the testcase was to bring down a node, update its kernel an revive
>> it - just as you would like to do it in real world for a kernel update.
>> We found out that some files were taken from the downed node afterwards and
>> the new contents on the other node got in fact overwritten.
>> This does not happen generally, of course. But it does happen. We could only
>> stop this behaviour by setting "favorite-child". But that does not really help
>> a lot, since we want to take down all nodes some other day.
>> This is in fact one of our show-stoppers.
>>
>>
>> On Sun, 7 Mar 2010 01:33:14 -0800
>> Liam Slusser <lslusser at gmail.com> wrote:
>>
>>> Assuming you used raid1 (distribute), you DO bring up the new machine
>>> and start gluster.  On one of your gluster mounts you run a ls -alR
>>> and it will resync the new node.  The gluster clients are smart enough
>>> to get the files from the first node.
>>>
>>> liam
>>>
>>> On Sat, Mar 6, 2010 at 11:48 PM, Chad <ccolumbu at hotmail.com> wrote:
>>>> Ok, so assuming you have N glusterfsd servers (say 2 cause it does not
>>>> really matter).
>>>> Now one of the servers dies.
>>>> You repair the machine and bring it back up.
>>>>
>>>> I think 2 things:
>>>> 1. You should not start glusterfsd on boot (you need to sync the HD first)
>>>> 2. When it is up how do you re-sync it?
>>>>
>>>> Do you rsync the underlying mount points?
>>>> If it is a busy gluster cluster it will be getting new files all the time.
>>>> So how do you sync and bring it back up safely so that clients don't connect
>>>> to an incomplete server?
>>>>
>>>> ^C
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 
>