That seems correct with 1 change, not only do I get the old file in step 5, that old file overwrites the newer file on the node that did not go down. > 1) What versions are you using ? glusterfs 3.0.2 built on Feb 7 2010 00:15:44 Repository revision: v3.0.2 > 2) Can you share your volume files ? Are they generated using volgen ? I did generate them via volgen, but then modified them because I have 3 shares, but only to rename things. (vol files at end of e-mail) > 3) Did you notice any patterns for the files where the wrong copy was picked ? like > were they open when the node was brought down ? I was not monitoring this. > 4) Any other way to reproduce the problem ? See my nfs issue below, although I don't think they are related. > 5) Any other patterns you observed when you see the problem ? See my nfs issue below, although I don't think they are related. > 6) Would you have listings of problem file(s) from the replica nodes ? No. Also I did something today that works on nfs but does not work in gluster. I have a share mounted on /cs_data. I have directories in that share /cs_data/web and /cs_data/home I move the /cs_data/web into /cs_data/home (so I get: /cs_data/home/web) then symlink /cs_data/web to /cs_data/home/web, like this: cd /cs_data; mv web home; ln -s home/web On all the clients /cs_data/web does not work anymore. If I unmount and remount it works again. Unfortunately for the unmount/mount to work I have to kill things like httpd. So to do a simple dir move (because I had it in the wrong place) on a read-only dir, I have to kill my service. I have done exactly this with an nfs mount and it did not fail at all, I did not have to kill httpd and I did not have to unmount/remount the share. ------------------ --- server.vol --- ------------------ # $ /usr/bin/glusterfs-volgen -n tcb_data -p 50001 -r 1 -c /etc/glusterfs 10.0.0.24:/mnt/tcb_data 10.0.0.25:/mnt/tcb_data ###################################### # Start tcb share ###################################### volume tcb_posix type storage/posix option directory /mnt/tcb_data end-volume volume tcb_locks type features/locks subvolumes tcb_posix end-volume volume tcb_brick type performance/io-threads option thread-count 8 subvolumes tcb_locks end-volume volume tcb_server type protocol/server option transport-type tcp option auth.addr.tcb_brick.allow * option transport.socket.listen-port 50001 option transport.socket.nodelay on subvolumes tcb_brick end-volume ------------------ --- tcb client.vol --- ------------------ volume tcb_remote_glust1 type protocol/client option transport-type tcp option ping-timeout 5 option remote-host 10.0.0.24 option transport.socket.nodelay on option transport.remote-port 50001 option remote-subvolume tcb_brick end-volume volume tcb_remote_glust2 type protocol/client option transport-type tcp option ping-timeout 5 option remote-host 10.0.0.25 option transport.socket.nodelay on option transport.remote-port 50001 option remote-subvolume tcb_brick end-volume volume tcb_mirror type cluster/replicate subvolumes tcb_remote_glust1 tcb_remote_glust2 end-volume volume tcb_writebehind type performance/write-behind option cache-size 4MB subvolumes tcb_mirror end-volume volume tcb_readahead type performance/read-ahead option page-count 4 subvolumes tcb_writebehind end-volume volume tcb_iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes tcb_readahead end-volume volume tcb_quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes tcb_iocache end-volume volume tcb_statprefetch type performance/stat-prefetch subvolumes tcb_quickread end-volume ^C Tejas N. Bhise wrote: > Chad, Stephan - thank you for your feedback. > > Just to clarify on what wrote, do you mean to say that - > > 1) The setup is a replicate setup with the file being written to multiple nodes. > 2) One of these nodes is brought down. > 3) A replicated file with a copy on the node brought down is written to. > 4) The other copies are updates as writes happen while this node is still down. > 5) After this node is brought up, the client sometimes sees the old file on the node brought up > instead of picking the file from a node that has the latest copy. > > If the above is correct, quick questions - > > 1) What versions are you using ? > 2) Can you share your volume files ? Are they generated using volgen ? > 3) Did you notice any patterns for the files where the wrong copy was picked ? like > were they open when the node was brought down ? > 4) Any other way to reproduce the problem ? > 5) Any other patterns you observed when you see the problem ? > 6) Would you have listings of problem file(s) from the replica nodes ? > > If however my understanding was not correct, then please let me know with some > examples. > > Regards, > Tejas. > > ----- Original Message ----- > From: "Chad" <ccolumbu at hotmail.com> > To: "Stephan von Krawczynski" <skraw at ithnet.com> > Cc: gluster-users at gluster.org > Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi > Subject: Re: How to re-sync > > I actually do prefer top post. > > Well this "overwritten" behavior is what I saw as well and that is a REALLY REALLY bad thing. > Which is why I asked my question in the first place. > > Is there a gluster developer out there working on this problem specifically? > Could we add some kind of "sync done" command that has to be run manually and until it is the failed node is not used? > The bottom line for me is that I would much rather run on a performance degraded array until a sysadmin intervenes, than loose any data. > > ^C > > > > Stephan von Krawczynski wrote: >> I love top-post ;-) >> >> Generally, you are right. But in real-life you cannot trust on this >> "smartness". We tried exactly this point and had to find out that the clients >> do not always select the correct file version (i.e. the latest) automatically. >> Our idea in the testcase was to bring down a node, update its kernel an revive >> it - just as you would like to do it in real world for a kernel update. >> We found out that some files were taken from the downed node afterwards and >> the new contents on the other node got in fact overwritten. >> This does not happen generally, of course. But it does happen. We could only >> stop this behaviour by setting "favorite-child". But that does not really help >> a lot, since we want to take down all nodes some other day. >> This is in fact one of our show-stoppers. >> >> >> On Sun, 7 Mar 2010 01:33:14 -0800 >> Liam Slusser <lslusser at gmail.com> wrote: >> >>> Assuming you used raid1 (distribute), you DO bring up the new machine >>> and start gluster. On one of your gluster mounts you run a ls -alR >>> and it will resync the new node. The gluster clients are smart enough >>> to get the files from the first node. >>> >>> liam >>> >>> On Sat, Mar 6, 2010 at 11:48 PM, Chad <ccolumbu at hotmail.com> wrote: >>>> Ok, so assuming you have N glusterfsd servers (say 2 cause it does not >>>> really matter). >>>> Now one of the servers dies. >>>> You repair the machine and bring it back up. >>>> >>>> I think 2 things: >>>> 1. You should not start glusterfsd on boot (you need to sync the HD first) >>>> 2. When it is up how do you re-sync it? >>>> >>>> Do you rsync the underlying mount points? >>>> If it is a busy gluster cluster it will be getting new files all the time. >>>> So how do you sync and bring it back up safely so that clients don't connect >>>> to an incomplete server? >>>> >>>> ^C >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > >