Re: Client side AFR race conditions?

Gordan Bobic <gordan@xxxxxxxxxx> · Tue, 06 May 2008 22:05:14 +0100

Martin Fick wrote:
--- gordan@xxxxxxxxxx wrote:

Hmm... So you are saying the problem is writing
without locking? 

No, others are saying that. :)  I am saying that 
writing without locking should be supported (after all
how do I lock from a script, certainly writing to
glusterfs from a script should be supported?) and
it can cause split brain if using AFR.  

Actually, GFS does this by effectively locking a file. Locks aren't just 
flock/fcntl POSIX locks. They are lower level than that. Without this, 
you have no hope of achieving truly local FS kind of consistency. And 
yes - it takes a performance hit for this. But I've yet to hear of a 
theoretical proposition that completely does away with this.

Should writing to a file not involve
an implicit lock, regardles of flock?

While that might solve it, it certainly would be a
performance hit.  At this point I would welcome even
the slow locking solution.  Since it seems pretty easy
to cause this problem, I would not trust AFR to say a
mail spool without some way of preventing this.

Indeed - I think the only solution is implicit write-locking. open() for 
write has to lock the file across the cluster.

And then we start needing directory metadata journalling, or else you 
have the same problem with the race condition and disconnected 
operation. Without a metadata journal to replay, there is no way to tell 
 which version xyz of the several possible on several servers is the 
most up to date / correct one (if any).

Gordan