> A scenario which should make this clear: Let's say the file a.c is removed > a from a 2-node replication cluster. Something like the following should > occur: Step 1 is to lock the resource. Step 2 is to record the intent to > remove on each node. Step 3 is to remove on each node. Step 4 is to clear > the intent from each node. Step 5 is to unlock the resource. Now, let's say > that one node is not accessible during this process and it comes back up > later. After it comes back up, should a process that happens to see the file > does not exist on node 1, but does exist on node 2. Should the file exist or > not? I don't know if GlusterFS even does this correctly - but if it does, > the file should NOT exist. There should be sufficient information, probably > in the journal, to show that the file was *removed*, and therefore, even if > one node still has the file, the journal tells us that the file was removed. > The self-heal operation should remove the file from the node that was down > as soon as the discrepancy is detected. This is how exactly things happen inside. The file will be deleted the next time the directory is accessed. > a Java program trying to use file locking failed in a GlusterFS mount point, > but succeeded in /var/tmp, Can you give us some more details or test cases to reproduce this? Do you know if it is flock or fcntl based locks? Avati