On 11/07/2013 05:19 PM, ?ystein Viggen wrote: > Hi, > > I have a small test setup on Ubuntu 12.04, using the > 3.4.1-ubuntu1~precise1 packages of glusterfs from the recommended PPA. > There are two gluster servers (replicate 2) and one client. Bricks are > 16 GB xfs -i size=512 filesystems. All servers run on vmware. > > I've been using the linux kernel source for some simple performance and > stability tests with many small files. When deleting the linux kernel > tree with rm -Rf while rebooting one glusterfs server, it seems that > some deletes are missed, or "recreated". Here's how it goes: > > root at client:/mnt# rm -Rf linux-3.12 > > At this point, I run "shutdown -r now" on one server. The deletion > seems to keep running just fine, but just as the server comes back up, I > get something like this on the client: > > rm: cannot remove `linux-3.12/arch/mips/ralink/dts': Directory not empty > > After the rm has run to completion: > > root at client:/mnt# find linux-3.12 -type f > linux-3.12/arch/mips/ralink/dts/Makefile > > Sometimes it's more than one file, too. "gluster volume heal volname > info" shows no outstanding entries. > > If I turn off one server before running rm, and turn it on during the rm > run, a similar thing happens, only it seems worse. In one test, I had > 9220 files left after rm had finished. > > If both servers are up during the rm run, all files are deleted as > expected every time. > > > What is happening here, and can I do something to avoid it? It sounds like a split brain issue. Below mentioned commands will help you to figure this out. gluster v heal <volumeName> info split-brain gluster v heal <volumeName> info heal-failed If you see any split-brain , then it is a bug. We can check with gluster-devel if it is fixed in the master branch or there is bug for it in bugzilla. > > I was hoping that in a replica 2 cluster, you could safely reboot one > server at a time (with sync-up time in between) to, say, apply OS > patches without taking the gluster volume offline. > Yup, this should work. But not sure if there is any bug in gluster which is causing the issue for you. The work around would be to do stop/kill all gluster service in one of the machine. make sure the glusterd service does not automatically start at next boot ( one time activity) . Apply patches to the os, reboot it, start the glusterd service. Check the self heal process to do all the sync required. You can repeat the steps for the other node once this node have all consistent data. > I'm thankful for any help. > > ?ystein > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users