Re: self heal problem

Ed W <lists@xxxxxxxxxxxxxx> · Sat, 27 Mar 2010 14:01:42 +0000

Hi Tejas

 From the attributes you have shown it seems that you have gone to
the backend directly, bypassed glusterfs, and hand crafted such a
situation. The way the code is written, we do not think that we can
reach the state you have shown in your example.

The remote1 and remote2 attributes show all zeroes which means
that there were no operations pending on any server.

Just a quick check, but what happens in the case that a file is deleted 
while a replication volume is down?  Are "tombstones" created to track 
the deletions?

It would seem that you could cause the situation that Stephen describes 
as follows:

1) Replication up.  Create a file
2) Bring a replica down
3) Delete the file from the remaining replica
4) Create a new file with the same name as we deleted in 3).  Optionally 
try to trigger things to go wrong by choosing the new file to have older 
ctime, smaller filesize, arbitrary inode, etc
5) Bring replica back up and stat files

In this situation how will gluster handle resolving which version of the 
file to keep?  Our intention is obviously that we want to keep the 
recreated file, however, unless gluster is tracking deletions (dunno - 
is it?) then it's impossible to decide which file to keep simply by 
looking at ctime or other posix attributes of the two files?

Can you comment on glusters algorithm in this situation please?

Ed W