Re: solutions for split brain situation

Anand Avati <avati@xxxxxxxxxxx> · Fri, 18 Sep 2009 09:20:40 +0530

>  A scenario which should make this clear: Let's say the file a.c is removed
> a from a 2-node replication cluster. Something like the following should
> occur: Step 1 is to lock the resource. Step 2 is to record the intent to
> remove on each node. Step 3 is to remove on each node. Step 4 is to clear
> the intent from each node. Step 5 is to unlock the resource. Now, let's say
> that one node is not accessible during this process and it comes back up
> later. After it comes back up, should a process that happens to see the file
> does not exist on node 1, but does exist on node 2. Should the file exist or
> not? I don't know if GlusterFS even does this correctly - but if it does,
> the file should NOT exist. There should be sufficient information, probably
> in the journal, to show that the file was *removed*, and therefore, even if
> one node still has the file, the journal tells us that the file was removed.
> The self-heal operation should remove the file from the node that was down
> as soon as the discrepancy is detected.

This is how exactly things happen inside. The file will be deleted the
next time the directory is accessed.

> a Java program trying to use file locking failed in a GlusterFS mount point,
> but succeeded in /var/tmp,

Can you give us some more details or test cases to reproduce this? Do
you know if it is flock or fcntl based locks?

Avati