Re: Re; Load balancing ...

gordan@xxxxxxxxxx · Thu, 1 May 2008 11:04:31 +0100 (BST)

On Wed, 30 Apr 2008, Martin Fick wrote:

No, no need to keep the old data around.  We only need
to remember the start and span of each changed section
along with the file version of the change!  This is
much esier/space efficient than snapshots.  Excuse me
for being ignorant of the actual sizes of these three
parameters, but they can't be larger than 8 bytes
each, can they?  8*3 = 24 bytes.  A 100MB journal
filesystem could store almost 50 thousand different
file changes!

OK, you're right. I follow what you mean now. Not keeping all versions, 
just keep the (version,start,finish) pointers in the log. That way a 
client can see what happened since it's version and only sync the blocks 
listed, and if the log has rolled over, then (r)sync the whole file. 
Sounds like a good idea. The next question is where to keep the log. 1 log 
per file? 1 log per directory? How to store them? Shadow files? Separate 
shadow volume? A shadow volume might be a good idea because it keeps the 
main source mounted directory exactly the same as a normal directory. A 
(shadow volume) log should, ideally, also keep additional sanity check 
information such as file metadata (timestamps, size) for cross-check of 
whether something went weird and the file was changed underneath 
GlusterFS, and if it has, flush out the log and force a full resync on 
the file.

Gordan